Hive EXPLAIN (실행계획) 사용

SW Engineering/Hadoop

Hive EXPLAIN (실행계획) 사용

SungWookKang 2020. 9. 22. 14:27

Hive EXPLAIN (실행계획) 사용

· Version : Hive

하이브에서 EXPLAIN 명령을 사용하면 실행계획을 확인할 수 있다. 즉, 쿼리를 어떻게 맵리듀스 잡으로 변환하지를 살펴 볼 수 있다. 실행계획을 사용하는 방법은 아래 스크립트 처럼 쿼리문 앞에 EXPLAIN 명령을 함께 실행한다.

EXPLAIN

SELECT

col_1, date_local, count(*) as cnt

FROM tbl_a

where col_1 = 'aaa.com'

group by date_local, col_1;

아래 실행 계획은 위 쿼리를 실행하였을때 반환된 결과이다.

1 STAGE DEPENDENCIES:

2 Stage-1 is a root stage

3 Stage-0 depends on stages: Stage-1

5 STAGE PLANS:

6 Stage: Stage-1

7 Map Reduce

8 Map Operator Tree:

9 TableScan

10 alias: tbl_a

11 filterExpr: (col_1 = 'aaa.com') (type: boolean)

12 Statistics: Num rows: 5275084 Data size: 1039193932 Basic stats: COMPLETE Column stats: PARTIAL

13 Filter Operator

14 predicate: (col_1 = 'aaa.com') (type: boolean)

15 Statistics: Num rows: 2637542 Data size: 485307728 Basic stats: COMPLETE Column stats: PARTIAL

16 Select Operator

17 expressions: date_local (type: string)

18 outputColumnNames: date_local

19 Statistics: Num rows: 2637542 Data size: 485307728 Basic stats: COMPLETE Column stats: PARTIAL

20 Group By Operator

21 aggregations: count()

22 keys: date_local (type: string), 'aaa.com' (type: string)

23 mode: hash

24 outputColumnNames: _col0, _col1, _col2

25 Statistics: Num rows: 1010 Data size: 291890 Basic stats: COMPLETE Column stats: PARTIAL

26 Reduce Output Operator

27 key expressions: _col0 (type: string), 'aaa.com' (type: string)

28 sort order: ++

29 Map-reduce partition columns: _col0 (type: string), 'aaa.com' (type: string)

30 Statistics: Num rows: 1010 Data size: 291890 Basic stats: COMPLETE Column stats: PARTIAL

31 value expressions: _col2 (type: bigint)

32 Reduce Operator Tree:

33 Group By Operator

34 aggregations: count(VALUE._col0)

35 keys: KEY._col0 (type: string), 'aaa.com' (type: string)

36 mode: mergepartial

37 outputColumnNames: _col0, _col1, _col2

38 Statistics: Num rows: 202 Data size: 58378 Basic stats: COMPLETE Column stats: PARTIAL

39 Select Operator

40 expressions: 'aaa.com' (type: string), _col0 (type: string), _col2 (type: bigint)

41 outputColumnNames: _col0, _col1, _col2

42 Statistics: Num rows: 202 Data size: 58378 Basic stats: COMPLETE Column stats: PARTIAL

43 File Output Operator

44 compressed: false

45 Statistics: Num rows: 202 Data size: 58378 Basic stats: COMPLETE Column stats: PARTIAL

46 table:

47 input format: org.apache.hadoop.mapred.SequenceFileInputFormat

48 output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat

49 serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

51 Stage: Stage-0

52 Fetch Operator

53 limit: -1

54 Processor Tree:

55 ListSink

· 1 ~ 2 : 하이브 잡은 하나 이상의 스테이지(stage)로 구성된다. 스테이지 사이에는 의존관계가 있으며 복잡한 쿼리는 많은 스테이지로 구성되며 많은 시간이 소요된다. 스테이지는 맵리듀스 잡 스테이지, 샘플링(sampling) 스테이지, 병합 스테이지, limit 스테이지 등 하이브가 필요한 일을 하는 스테이지로 구성된다. 하이브는 각 스테이지를 하나씩 실행한다.

· 5 : STAGE PLANS 절은 매우 길고 복잡하다.

· 6 : 잡을 위한 처리 묶음이고 맵리듀스를 통해 실행된다.

· 7 : Map Operator Tree: 이후의 모든 부분은 잡의 맵 부분에서 실행한다.

· 9 : TableScan 태스크는 테이블을 입력으로 받아 결과 컬럼을 만든다.

· 13 : Filter Operator에서 조회 조건을 필터 한다

· 16 : Select Operator에서 출력할 대상을 선정한다.

· 20 : Group By Operator에서 쿼리에서 요청한 카운트 작업을 한다.

· 32 : Reduce Operator Tree : 이후의 부분은 모두 리듀스 부분에서 실행한다.

· 33 : Reduce에서도 Group By Operator를 확인하 수 있는데, 각 맵에서 전달받은 값을 count 한다.

· 43, 48 : 마지막으로 파일을 출력하며, 해당 문자열 출력 포맷을 사용한다.

· 51 : 쿼리에서 limit 절이 없기 때문에 Stage-0은 no-op 스테이지 이다.

2020-09-21 / Sungwook Kang / http://sungwookkang.com

Hadoop, Big Data, 하둡, 빅데이터, 데이터분석, Hive, 쿼리 실행계획, Hive 쿼리, Hive Query execution plan

저작자표시 비영리 변경금지