【Hive学习之八】Hive 调优【重要】

时间:2019-02-19 16:02:07   收藏:0   阅读:691

环境
  虚拟机:VMware 10
  Linux版本:CentOS-6.5-x86_64
  客户端:Xshell4
  FTP:Xftp4
  jdk8
  hadoop-3.1.1
  apache-hive-3.1.1

一、执行计划
核心思想:把Hive SQL当做Mapreduce程序去优化
以下SQL不会转为Mapreduce来执行
  -select仅查询本表字段
  -where仅对本表字段做条件过滤

Explain 显示执行计划:EXPLAIN [EXTENDED] query

hive> explain select count(*) from psn2;
OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: psn2
            Statistics: Num rows: 2 Data size: 7440 Basic stats: COMPLETE Column stats: NONE
            Select Operator
              Statistics: Num rows: 2 Data size: 7440 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: count()
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                  value expressions: _col0 (type: bigint)
      Execution mode: vectorized
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 2.7 seconds, Fetched: 43 row(s)
hive>

 

hive> explain extended select count(*) from psn2;
OK
Explain
STAGE DEPENDENCIES:
  Stage-1 is a root stage
  Stage-0 depends on stages: Stage-1

STAGE PLANS:
  Stage: Stage-1
    Map Reduce
      Map Operator Tree:
          TableScan
            alias: psn2
            Statistics: Num rows: 2 Data size: 7440 Basic stats: COMPLETE Column stats: NONE
            GatherStats: false
            Select Operator
              Statistics: Num rows: 2 Data size: 7440 Basic stats: COMPLETE Column stats: NONE
              Group By Operator
                aggregations: count()
                mode: hash
                outputColumnNames: _col0
                Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                Reduce Output Operator
                  null sort order: 
                  sort order: 
                  Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
                  tag: -1
                  value expressions: _col0 (type: bigint)
                  auto parallelism: false
      Execution mode: vectorized
      Path -> Alias:
        hdfs://PCS102:9820/root/hive_remote/warehouse/psn2/age=10 [psn2]
        hdfs://PCS102:9820/root/hive_remote/warehouse/psn2/age=20 [psn2]
      Path -> Partition:
        hdfs://PCS102:9820/root/hive_remote/warehouse/psn2/age=10 
          Partition
            base file name: age=10
            input format: org.apache.hadoop.mapred.TextInputFormat
            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
            partition values:
              age 10
            properties:
              bucket_count -1
              collection.delim -
              column.name.delimiter ,
              columns id,name,likes,address
              columns.comments 
              columns.types int:string:array<string>:map<string,string>
              field.delim ,
              file.inputformat org.apache.hadoop.mapred.TextInputFormat
              file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              line.delim 

              location hdfs://PCS102:9820/root/hive_remote/warehouse/psn2/age=10
              mapkey.delim :
              name default.psn2
              numFiles 1
              numRows 0
              partition_columns age
              partition_columns.types int
              rawDataSize 0
              serialization.ddl struct psn2 { i32 id, string name, list<string> likes, map<string,string> address}
              serialization.format ,
              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              totalSize 372
              transient_lastDdlTime 1548986286
            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              properties:
                bucket_count -1
                bucketing_version 2
                collection.delim -
                column.name.delimiter ,
                columns id,name,likes,address
                columns.comments 
                columns.types int:string:array<string>:map<string,string>
                field.delim ,
                file.inputformat org.apache.hadoop.mapred.TextInputFormat
                file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                line.delim 

                location hdfs://PCS102:9820/root/hive_remote/warehouse/psn2
                mapkey.delim :
                name default.psn2
                partition_columns age
                partition_columns.types int
                serialization.ddl struct psn2 { i32 id, string name, list<string> likes, map<string,string> address}
                serialization.format ,
                serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                transient_lastDdlTime 1548915997
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.psn2
            name: default.psn2
        hdfs://PCS102:9820/root/hive_remote/warehouse/psn2/age=20 
          Partition
            base file name: age=20
            input format: org.apache.hadoop.mapred.TextInputFormat
            output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
            partition values:
              age 20
            properties:
              bucket_count -1
              collection.delim -
              column.name.delimiter ,
              columns id,name,likes,address
              columns.comments 
              columns.types int:string:array<string>:map<string,string>
              field.delim ,
              file.inputformat org.apache.hadoop.mapred.TextInputFormat
              file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              line.delim 

              location hdfs://PCS102:9820/root/hive_remote/warehouse/psn2/age=20
              mapkey.delim :
              name default.psn2
              numFiles 1
              numRows 0
              partition_columns age
              partition_columns.types int
              rawDataSize 0
              serialization.ddl struct psn2 { i32 id, string name, list<string> likes, map<string,string> address}
              serialization.format ,
              serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              totalSize 372
              transient_lastDdlTime 1548986540
            serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
          
              input format: org.apache.hadoop.mapred.TextInputFormat
              output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
              properties:
                bucket_count -1
                bucketing_version 2
                collection.delim -
                column.name.delimiter ,
                columns id,name,likes,address
                columns.comments 
                columns.types int:string:array<string>:map<string,string>
                field.delim ,
                file.inputformat org.apache.hadoop.mapred.TextInputFormat
                file.outputformat org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
                line.delim 

                location hdfs://PCS102:9820/root/hive_remote/warehouse/psn2
                mapkey.delim :
                name default.psn2
                partition_columns age
                partition_columns.types int
                serialization.ddl struct psn2 { i32 id, string name, list<string> likes, map<string,string> address}
                serialization.format ,
                serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                transient_lastDdlTime 1548915997
              serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
              name: default.psn2
            name: default.psn2
      Truncated Path -> Alias:
        /psn2/age=10 [psn2]
        /psn2/age=20 [psn2]
      Needs Tagging: false
      Reduce Operator Tree:
        Group By Operator
          aggregations: count(VALUE._col0)
          mode: mergepartial
          outputColumnNames: _col0
          Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
          File Output Operator
            compressed: false
            GlobalTableId: 0
            directory: hdfs://PCS102:9820/tmp/hive/root/6f8ff71f-87bd-4d46-9f9a-516708d65459/hive_2019-02-19_10-58-42_159_2637812497308639143-1/-mr-10001/.hive-staging_hive_2019-02-19_10-58-42_159_2637812497308639143-1/-ext-10002
            NumFilesPerFileSink: 1
            Statistics: Num rows: 1 Data size: 8 Basic stats: COMPLETE Column stats: NONE
            Stats Publishing Key Prefix: hdfs://PCS102:9820/tmp/hive/root/6f8ff71f-87bd-4d46-9f9a-516708d65459/hive_2019-02-19_10-58-42_159_2637812497308639143-1/-mr-10001/.hive-staging_hive_2019-02-19_10-58-42_159_2637812497308639143-1/-ext-10002/
            table:
                input format: org.apache.hadoop.mapred.SequenceFileInputFormat
                output format: org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat
                properties:
                  columns _col0
                  columns.types bigint
                  escape.delim                   hive.serialization.extend.additional.nesting.levels true
                  serialization.escape.crlf true
                  serialization.format 1
                  serialization.lib org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
                serde: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
            TotalFiles: 1
            GatherStats: false
            MultiFileSpray: false

  Stage: Stage-0
    Fetch Operator
      limit: -1
      Processor Tree:
        ListSink

Time taken: 0.142 seconds, Fetched: 192 row(s)
hive> 

二、运行模式
(1)分为 本地模式 和 集群模式
(2)开启本地模式(对于数据量少的表情况):
  set hive.exec.mode.local.auto=true;
注意:
  hive.exec.mode.local.auto.inputbytes.max默认值为128M
  表示加载文件的最大值,若大于该配置仍会以集群方式来运行!

hive> set hive.exec.mode.local.auto=true;
hive> select count(*)  from psn21;
Automatically selecting local only mode for query
Query ID = root_20190219144810_0bafff9e-1c40-45f6-b687-60c5d13c9f0c
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-19 14:48:11,839 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1827024396_0002
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 4702 HDFS Write: 623 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
_c0
6
Time taken: 1.376 seconds, Fetched: 1 row(s)
hive> set hive.exec.mode.local.auto=false;
hive> select count(*)  from psn21;
Query ID = root_20190219144841_6fd11106-5db1-4335-8b0b-884697b558df
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548397153910_0013, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0013/
Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job  -kill job_1548397153910_0013
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-02-19 14:48:49,046 Stage-1 map = 0%,  reduce = 0%
2019-02-19 14:48:54,210 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.87 sec
2019-02-19 14:48:59,328 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.28 sec
MapReduce Total cumulative CPU time: 6 seconds 280 msec
Ended Job = job_1548397153910_0013
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.28 sec   HDFS Read: 13976 HDFS Write: 101 SUCCESS
Total MapReduce CPU Time Spent: 6 seconds 280 msec
OK
_c0
6
Time taken: 18.923 seconds, Fetched: 1 row(s)
hive> 

三、并行计算
通过设置以下参数开启并行模式(需要关闭本地模式):
set hive.exec.parallel=true;

另:hive.exec.parallel.thread.number:一次SQL计算中允许并行执行的job个数的最大值

hive> set hive.exec.parallel;
hive.exec.parallel=false
hive> select t1.cnt1,t2.cnt2 from 
    > (select count(id) cnt1 from psn21) t1,
    > (select count(name) cnt2 from psn21)t2;
Warning: Map Join MAPJOIN[27][bigTable=?] in task Stage-4:MAPRED is a cross product
Warning: Map Join MAPJOIN[34][bigTable=?] in task Stage-5:MAPRED is a cross product
Warning: Shuffle Join JOIN[14][tables = [$hdt$_0, $hdt$_1]] in Stage Stage-2:MAPRED is a cross product
Query ID = root_20190219145608_b4f3d4e9-b858-41be-9ddc-6eccff0ec9d9
Total jobs = 5
Launching Job 1 out of 5
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548397153910_0014, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0014/
Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job  -kill job_1548397153910_0014
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-02-19 14:56:15,027 Stage-1 map = 0%,  reduce = 0%
2019-02-19 14:56:20,148 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.85 sec
2019-02-19 14:56:25,277 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 6.04 sec
MapReduce Total cumulative CPU time: 6 seconds 40 msec
Ended Job = job_1548397153910_0014
Launching Job 2 out of 5
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548397153910_0015, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0015/
Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job  -kill job_1548397153910_0015
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2019-02-19 14:56:37,661 Stage-3 map = 0%,  reduce = 0%
2019-02-19 14:56:42,786 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 2.8 sec
2019-02-19 14:56:46,875 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 5.92 sec
MapReduce Total cumulative CPU time: 5 seconds 920 msec
Ended Job = job_1548397153910_0015
Stage-7 is selected by condition resolver.
Stage-8 is filtered out by condition resolver.
Stage-2 is filtered out by condition resolver.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.1-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2019-02-19 14:56:55    Starting to launch local task to process map join;    maximum memory = 239075328
2019-02-19 14:56:56    Dump the side-table for tag: 1 with group count: 1 into file: file:/tmp/root/6f8ff71f-87bd-4d46-9f9a-516708d65459/hive_2019-02-19_14-56-08_997_6748376838876035123-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable2019-02-19 14:56:56    Uploaded 1 File to: file:/tmp/root/6f8ff71f-87bd-4d46-9f9a-516708d65459/hive_2019-02-19_14-56-08_997_6748376838876035123-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile01--.hashtable (278 bytes)

Execution completed successfully
MapredLocal task succeeded
Launching Job 4 out of 5
Number of reduce tasks is set to 0 since theres no reduce operator
Starting Job = job_1548397153910_0016, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0016/
Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job  -kill job_1548397153910_0016
Hadoop job information for Stage-4: number of mappers: 1; number of reducers: 0
2019-02-19 14:57:02,370 Stage-4 map = 0%,  reduce = 0%
2019-02-19 14:57:07,478 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 3.07 sec
MapReduce Total cumulative CPU time: 3 seconds 70 msec
Ended Job = job_1548397153910_0016
MapReduce Jobs Launched: 
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 6.04 sec   HDFS Read: 13637 HDFS Write: 114 SUCCESS
Stage-Stage-3: Map: 1  Reduce: 1   Cumulative CPU: 5.92 sec   HDFS Read: 13641 HDFS Write: 114 SUCCESS
Stage-Stage-4: Map: 1   Cumulative CPU: 3.07 sec   HDFS Read: 6091 HDFS Write: 103 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 30 msec
OK
t1.cnt1    t2.cnt2
6    6
Time taken: 59.527 seconds, Fetched: 1 row(s)
hive> set hive.exec.parallel=true;
hive> (select count(name) cnt2 from psn21)t2;
FAILED: ParseException line 1:36 extraneous input t2 expecting EOF near <EOF>
hive> select t1.cnt1,t2.cnt2 from 
    > (select count(id) cnt1 from psn21) t1,
    > (select count(name) cnt2 from psn21)t2;
Warning: Map Join MAPJOIN[27][bigTable=?] in task Stage-4:MAPRED is a cross product
Warning: Map Join MAPJOIN[34][bigTable=?] in task Stage-5:MAPRED is a cross product
Warning: Shuffle Join JOIN[14][tables = [$hdt$_0, $hdt$_1]] in Stage Stage-2:MAPRED is a cross product
Query ID = root_20190219145918_2f98437b-7070-41a4-905b-4c6a3a160d46
Total jobs = 5
Launching Job 1 out of 5
Launching Job 2 out of 5
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Starting Job = job_1548397153910_0018, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0018/
Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job  -kill job_1548397153910_0018
Starting Job = job_1548397153910_0017, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0017/
Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job  -kill job_1548397153910_0017
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2019-02-19 14:59:25,322 Stage-3 map = 0%,  reduce = 0%
2019-02-19 14:59:29,510 Stage-3 map = 100%,  reduce = 0%, Cumulative CPU 3.0 sec
2019-02-19 14:59:34,612 Stage-3 map = 100%,  reduce = 100%, Cumulative CPU 6.25 sec
MapReduce Total cumulative CPU time: 6 seconds 250 msec
Ended Job = job_1548397153910_0018
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2019-02-19 14:59:46,581 Stage-1 map = 0%,  reduce = 0%
2019-02-19 14:59:51,687 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.74 sec
2019-02-19 14:59:56,796 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 5.9 sec
MapReduce Total cumulative CPU time: 5 seconds 900 msec
Ended Job = job_1548397153910_0017
Stage-7 is selected by condition resolver.
Stage-8 is filtered out by condition resolver.
Stage-2 is filtered out by condition resolver.
SLF4J: Found binding in [jar:file:/usr/local/apache-hive-3.1.1-bin/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2019-02-19 15:00:07    Dump the side-table for tag: 1 with group count: 1 into file: file:/tmp/root/6f8ff71f-87bd-4d46-9f9a-516708d65459/hive_2019-02-19_14-59-18_586_8660726948780795909-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile21--.hashtable2019-02-19 15:00:07    Uploaded 1 File to: file:/tmp/root/6f8ff71f-87bd-4d46-9f9a-516708d65459/hive_2019-02-19_14-59-18_586_8660726948780795909-1/-local-10006/HashTable-Stage-4/MapJoin-mapfile21--.hashtable (278 bytes)

Execution completed successfully
MapredLocal task succeeded
Launching Job 4 out of 5
Number of reduce tasks is set to 0 since theres no reduce operator
Starting Job = job_1548397153910_0019, Tracking URL = http://PCS102:8088/proxy/application_1548397153910_0019/
Kill Command = /usr/local/hadoop-3.1.1/bin/mapred job  -kill job_1548397153910_0019
Hadoop job information for Stage-4: number of mappers: 1; number of reducers: 0
2019-02-19 15:00:16,324 Stage-4 map = 0%,  reduce = 0%
2019-02-19 15:00:20,426 Stage-4 map = 100%,  reduce = 0%, Cumulative CPU 2.95 sec
MapReduce Total cumulative CPU time: 2 seconds 950 msec
Ended Job = job_1548397153910_0019
MapReduce Jobs Launched: 
Stage-Stage-3: Map: 1  Reduce: 1   Cumulative CPU: 6.25 sec   HDFS Read: 13641 HDFS Write: 114 SUCCESS
Stage-Stage-1: Map: 1  Reduce: 1   Cumulative CPU: 5.9 sec   HDFS Read: 13637 HDFS Write: 114 SUCCESS
Stage-Stage-4: Map: 1   Cumulative CPU: 2.95 sec   HDFS Read: 6091 HDFS Write: 103 SUCCESS
Total MapReduce CPU Time Spent: 15 seconds 100 msec
OK
t1.cnt1    t2.cnt2
6    6
Time taken: 64.206 seconds, Fetched: 1 row(s)
hive> 

四、严格模式
通过设置以下参数开启严格模式:
set hive.mapred.mode=strict;
(默认为:nonstrict非严格模式)

查询限制:
1、对于分区表,必须添加where对于分区字段的条件过滤;

hive> set hive.mapred.mode=nonstrict;
hive> select * from psn22;
OK
psn22.id    psn22.name    psn22.likes    psn22.address    psn22.age    psn22.sex
1    小明1    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    18    boy
2    小明2    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    20    man
5    小明5    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    21    boy
3    小明3    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    21    boy
6    小明6    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    21    man
4    小明4    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    21    man
Time taken: 0.186 seconds, Fetched: 6 row(s)
hive> set hive.mapred.mode=strict;
hive> select * from psn22;
FAILED: SemanticException [Error 10056]: Queries against partitioned tables without a partition filter are disabled for safety reasons. If you know what you are doing, please set hive.strict.checks.no.partition.filter to false and make sure that hive.mapred.mode is not set to ‘strict‘ to proceed. Note that you may get errors or incorrect results if you make a mistake while using some of the unsafe features. No partition predicate for Alias "psn22" Table "psn22"
hive> select * from psn22 where age=18 and sex=boy;
OK
psn22.id    psn22.name    psn22.likes    psn22.address    psn22.age    psn22.sex
1    小明1    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}    18    boy
Time taken: 0.282 seconds, Fetched: 1 row(s)
hive> 

2、order by语句必须包含limit输出限制

hive> set hive.mapred.mode=strict;
hive> select * from psn21 order by id;
FAILED: SemanticException 1:29 Order by-s without limit are disabled for safety reasons. If you know what you are doing, please set hive.strict.checks.orderby.no.limit to false and make sure that hive.mapred.mode is not set to strict to proceed. Note that you may get errors or incorrect results if you make a mistake while using some of the unsafe features.. Error encountered near token id
hive> select * from psn21 order by id limit 2;
Automatically selecting local only mode for query
Query ID = root_20190219143842_b465a76f-a890-4bdc-aa76-b713c3ea13c0
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapreduce.job.reduces=<number>
Job running in-process (local Hadoop)
2019-02-19 14:38:43,896 Stage-1 map = 100%,  reduce = 100%
Ended Job = job_local1585589360_0001
MapReduce Jobs Launched: 
Stage-Stage-1:  HDFS Read: 1696 HDFS Write: 261 SUCCESS
Total MapReduce CPU Time Spent: 0 msec
OK
psn21.id    psn21.name    psn21.age    psn21.sex    psn21.likes    psn21.address
1    小明1    18    boy    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}
2    小明2    20    man    ["lol","book","movie"]    {"beijing":"shangxuetang","shanghai":"pudong"}
Time taken: 1.89 seconds, Fetched: 2 row(s)
hive>

3、限制执行笛卡尔积的查询。

 

原文:https://www.cnblogs.com/cac2020/p/10401411.html

评论(0
© 2014 bubuko.com 版权所有 - 联系我们:wmxa8@hotmail.com
打开技术之扣,分享程序人生!