升级到hive0.13问题记录

来源：动视网责编：小采时间：2020-11-09 13:07:50

升级到hive0.13问题记录

升级到hive0.13问题记录:hive单表分区数过多（实际上分区数越多查询越慢，应控制分区数在5000以下），执行查询报错： java.lang.OutOfMemoryError: Java heap space 参考：OOM occurs when query spans to a large number of partitions 原因：

推荐度：

点击下载本文 文档为doc格式

导读升级到hive0.13问题记录:hive单表分区数过多（实际上分区数越多查询越慢，应控制分区数在5000以下），执行查询报错： java.lang.OutOfMemoryError: Java heap space 参考：OOM occurs when query spans to a large number of partitions 原因：

hive单表分区数过多（实际上分区数越多查询越慢，应控制分区数在5000以下），执行查询报错： java.lang.OutOfMemoryError: Java heap space 参考：OOM occurs when query spans to a large number of partitions 原因： hive会在执行查询时先将元数据中的分

hive单表分区数过多（实际上分区数越多查询越慢，应控制分区数在5000以下），执行查询报错：
java.lang.OutOfMemoryError: Java heap space
参考：OOM occurs when query spans to a large number of partitions
原因：
hive会在执行查询时先将元数据中的分区信息加载到内存中，包括PARTITIONS、PARTITION_KEY_VALS、PARTITION_PARAMS等表的数据，如果分区数过多，这些表中的数据量也越大，hiveserver2默认的堆内存只有256M，因此heap不足。
如果hive-site.xml配置mapred.reduce.tasks数目较多（默认为-1，即slave个数），会导致每个查询job产生更多的map过程，同时分区数较多，加大了单个mapred加载的分区数据量。而在mapred-site.xml中的配置占用内存过低也会导致查询执行过程中报错，可适当调整：mapred.child.java.opts=-Xmx512m -XX:+UseConcMarkSweepGC
解决：按照其他规则分区，降低目标表分区数，修改hive-env.sh，加入配置：export HADOOP_HEAPSIZE=2048
hive0.12升级到0.13后启动hiveserver2，beeline登入执行任何查询均报错：
Exception in thread "main" java.lang.NoSuchMethodError: org.apache.hadoop.conf.Configuration.unset(Ljava/lang/String;)V
原因：hadoop1.0.3中没有Configuration.unset(String)这个方法，对比API可知：Configuration2.2.0、Configuration1.0.4。
参考：NoSuchMethodError exception when using HIVE 0.13 with Hadoop 1.0.4
修复：以下三种方法任选其一
hiveserver2启动时加入参数：hiveserver2 –hiveconf fs.permissions.umask-mode=022
修改1.0.3源码：org/apache/hadoop/hive/ql/exec/Utilities.java，将第3417行改为：conf.set(“fs.permissions.umask-mode”, “”);
重新编译后将该类替换到hive-exec-0.13.0.jar包中。
修改hive-site.xml，加入以下配置：

fs.permissions.umask-mode
022
Setting a value for fs.permissions.umask-mode to work around issue in HIVE-6962.
It has no impact in hadoop 1.x line on hdfs operations.

升级到hive0.13后，hue3.5无法正常工作，hue提示如下：
Bad status for request TFetchResultsReq(operationHandle=TOperationHandle(hasResultSet=False, modifiedRowCount=None, operationType=0
hive.log中报错类似如下：
org.apache.hive.service.cli.HiveSQLException: Invalid SessionHandle: SessionHandle [b07190-9db8-43c8-a600-b93453be887b]
参考：hue 3.5.0 not work with hive 0.13、HUE-2095 [beeswax] Do not fetch statements without a resultset
原因：查看patch提供的TCLIService.thrift，第504行结构体TOperationHandle的定义中有说明，布尔值hasResultSet如果为true，则operation回调会生成一个可获取的结果集，注意这个结果集不为None但是可能size=0，若为false，则返回的结果集为None，这时再去遍历就会抛异常。
修复：下述方法由难到易任选其一
升级hue到3.6版本，下载地址：hue.zip，或使用git下载后重新安装：git clone http://go.rritw.com/github.com/cloudera/hue.git
将现有的hue3.5合并分支到3.6版本（风险较大，未经测试），查看分支：git branch -l
或者直接修改python文件：hue/apps/beeswax/src/beeswax/server/hive_server2_lib.py

class HiveServerDataTable(DataTable):
 def __init__(self, results, schema, operation_handle):
 self.schema = schema and schema.schema
 self.operation_handle = operation_handle
 if results is not None:
 self.row_set = HiveServerTRowSet(results.results, schema)
 self.has_more = not self.row_set.is_empty() # Should be results.hasMoreRows but always True in HS2
 self.startRowOffset = self.row_set.startRowOffset # Always 0 in HS2
-----------------------------------------------------------------------------------------
 def fetch_result(self, operation_handle, orientation=TFetchOrientation.FETCH_NEXT, max_rows=1000): 
 if operation_handle.hasResultSet:
 meta_req = TGetResultSetMetadataReq(operationHandle=operation_handle)
 schema = self.call(self._client.GetResultSetMetadata, meta_req)
 fetch_req = TFetchResultsReq(operationHandle=operation_handle, orientation=orientation, maxRows=max_rows)
 res = self.call(self._client.FetchResults, fetch_req)
 else:
 schema = None
 res = None 
 return res, schema

原文地址：升级到hive0.13 问题记录, 感谢原作者分享。

升级到hive0.13问题记录

推荐度：

点击下载本文 文档为doc格式

标签：记录升级问题

热门焦点

升级到hive0.13问题记录

升级到hive0.13问题记录

升级到hive0.13问题记录

最新推荐

猜你喜欢

热门推荐