Hive EXPLAIN (실행계획) 사용

SW Engineering/Hadoop

Hive EXPLAIN (실행계획) 사용

SungWookKang 2020. 9. 22. 14:27

Hive EXPLAIN (실행계획) 사용

· Version : Hive

하이브에서 EXPLAIN 명령을 사용하면 실행계획을 확인할 수 있다. 즉, 쿼리를 어떻게 맵리듀스 잡으로 변환하지를 살펴 볼 수 있다. 실행계획을 사용하는 방법은 아래 스크립트 처럼 쿼리문 앞에 EXPLAIN 명령을 함께 실행한다.

EXPLAIN

SELECT

col_1, date_local, count(*) as cnt

FROM tbl_a

where col_1 = 'aaa.com'

group by date_local, col_1;

아래 실행 계획은 위 쿼리를 실행하였을때 반환된 결과이다.

1 STAGE DEPENDENCIES:

2 Stage-1 is a root stage

3 Stage-0 depends on stages: Stage-1

5 STAGE PLANS:

6 Stage: Stage-1

7 Map Reduce

8 Map Operator Tree:

9 TableScan

10 alias: tbl_a

11 filterExpr: (col_1 = 'aaa.com') (type: boolean)

12 Statistics: Num rows: 5275084 Data size: 1039193932 Basic stats: COMPLETE Column stats: PARTIAL

13 Filter Operator

14 predicate: (col_1 = 'aaa.com') (type: boolean)

15 Statistics: Num rows: 2637542 Data size: 485307728 Basic stats: COMPLETE Column stats: PARTIAL

16 Select Operator

17 expressions: date_local (type: string)

18 outputColumnNames: date_local

19 Statistics: Num rows: 2637542 Data size: 485307728 Basic stats: COMPLETE Column stats: PARTIAL

20 Group By Operator

21 aggregations: count()

22 keys: date_local (type: string), 'aaa.com' (type: string)

23 mode: hash

24 outputColumnNames: _col0, _col1, _col2

25 Statistics: Num rows: 1010 Data size: 291890 Basic stats: COMPLETE Column stats: PARTIAL

26 Reduce Output Operator

27 key expressions: _col0 (type: string), 'aaa.com' (type: string)

28 sort order: ++

29 Map-reduce partition columns: _col0 (type: string), 'aaa.com' (type: string)

30 Statistics: Num rows: 1010 Data size: 291890 Basic stats: COMPLETE Column stats: PARTIAL

31 value expressions: _col2 (type: bigint)

32 Reduce Operator Tree:

33 Group By Operator

34 aggregations: count(VALUE._col0)

35 keys: KEY._col0 (type: string), 'aaa.com' (type: string)

36 mode: mergepartial

37 outputColumnNames: _col0, _col1, _col2

38 Statistics: Num rows: 202 Data size: 58378 Basic stats: COMPLETE Column stats: PARTIAL

39 Select Operator

40 expressions: 'aaa.com' (type: string), _col0 (type: string), _col2 (type: bigint)

41 outputColumnNames: _col0, _col1, _col2

42 Statistics: Num rows: 202 Data size: 58378 Basic stats: COMPLETE Column stats: PARTIAL

43 File Output Operator

44 compressed: false

45 Statistics: Num rows: 202 Data size: 58378 Basic stats: COMPLETE Column stats: PARTIAL

46 table:

47 input format: org.apache.hadoop.mapred.SequenceFileInputFormat

48 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

49 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

51 Stage: Stage-0

52 Fetch Operator

53 limit: -1

54 Processor Tree:

55 ListSink

· 1 ~ 2 : 하이브 잡은 하나 이상의 스테이지(stage)로 구성된다. 스테이지 사이에는 의존관계가 있으며 복잡한 쿼리는 많은 스테이지로 구성되며 많은 시간이 소요된다. 스테이지는 맵리듀스 잡 스테이지, 샘플링(sampling) 스테이지, 병합 스테이지, limit 스테이지 등 하이브가 필요한 일을 하는 스테이지로 구성된다. 하이브는 각 스테이지를 하나씩 실행한다.

· 5 : STAGE PLANS 절은 매우 길고 복잡하다.

· 6 : 잡을 위한 처리 묶음이고 맵리듀스를 통해 실행된다.

· 7 : Map Operator Tree: 이후의 모든 부분은 잡의 맵 부분에서 실행한다.

· 9 : TableScan 태스크는 테이블을 입력으로 받아 결과 컬럼을 만든다.

· 13 : Filter Operator에서 조회 조건을 필터 한다

· 16 : Select Operator에서 출력할 대상을 선정한다.

· 20 : Group By Operator에서 쿼리에서 요청한 카운트 작업을 한다.

· 32 : Reduce Operator Tree : 이후의 부분은 모두 리듀스 부분에서 실행한다.

· 33 : Reduce에서도 Group By Operator를 확인하 수 있는데, 각 맵에서 전달받은 값을 count 한다.

· 43, 48 : 마지막으로 파일을 출력하며, 해당 문자열 출력 포맷을 사용한다.

· 51 : 쿼리에서 limit 절이 없기 때문에 Stage-0은 no-op 스테이지 이다.

2020-09-21 / Sungwook Kang / http://sungwookkang.com

Hadoop, Big Data, 하둡, 빅데이터, 데이터분석, Hive, 쿼리 실행계획, Hive 쿼리, Hive Query execution plan

저작자표시 비영리 변경금지 (새창열림)

'SW Engineering > Hadoop' 카테고리의 다른 글

Hive LIMIT 튜닝 (데이터 샘플링으로 빠르게 응답하기) (0)	2020.09.23
HDFS 데이터노드 블록 스캐너 (손상된 블록을 검색하여 수정) (0)	2020.09.23
Hive 인덱스(index) (0)	2020.08.28
Hive View (0)	2020.08.25
Hive에서 샘플 데이터 추출 (0)	2020.08.14

현재글Hive EXPLAIN (실행계획) 사용

Data Science Lab