命令行安装#
Hive HA部署#
前提#
Hive需要依赖HDFS、YARN、ZooKeeper集群,元数据存储使用PG。
如果 Hive 以 HA+Kerberos模式部署,则需要ZooKeeper开启Kerberos认证。
ZooKeeper安装部署请参考:ZooKeeper 安装。
ZooKeeper服务地址假定为zookeeper1:2181,zookeeper2:2181,zookeeper3:2181
YARN安装部署请参考:YARN 安装。
YARN服务地址假定为yarn1:8090,yarn2:8090,yarn3:8090
HDFS安装部署请参考:HDFS 安装。
HDFS服务地址假定为hdfs1:9000,hdfs2:9000,hdfs3:9000
Hive的安装需要依赖外部数据库做元数据存储,默认使用Skylab平台本身的Postgres数据库。
Postgres地址假定为PG1
若需要配置Kerberos认证,那么需要提前部署好KDC服务:Kerberos 安装。
KDC服务地址假定为kdc1
如Hive与HDFS/YARN集群分离部署,则需要在所以Hive机器上安装HDFS Client并同步HDFS配置文件。
若开启Ranger认证,Ranger安装部署请参考:Ranger 安装。
Ranger 服务地址假定为ranger1
配置yum源并安装lava#
登录hive1机器,然后切换到root用户
ssh hive1
su root
配置yum源,安装lava命令行管理工具
# 从yum源所在机器(假设为192.168.1.10)获取repo文件
scp root@192.168.1.10:/etc/yum.repos.d/oushu.repo /etc/yum.repos.d/oushu.repo
# 追加yum源所在机器信息到/etc/hosts文件
# 安装lava命令行管理工具
yum clean all
yum makecache
yum install -y lava
创建hivehost:
touch ${HOME}/hivehost
配置hivehost内容为Hive有依赖的所有的节点hostname:
hive1
hive2
修改权限:
chmod 777 ${HOME}/hivehost
在首台机器上和集群内其他节点交换公钥,以便ssh免密码登陆和分发配置文件
# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hivehost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hivehost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
安装#
准备#
lava ssh -f ${HOME}/hivehost -e 'yum install -y hive'
# 如果Hive为分离部署,则需要安装HDFS客户端(可选)
lava ssh -f ${HOME}/hivehost -e 'yum install -y hdfs'
创建Hive路径,赋予hive用户权限
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /data1/hdfs/hive/hdfs'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /data1/hdfs/hive'
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /etc/security/keytabs/'
其中参数:hive.metastore.warehouse.dir指定的路径需要在HDFS中创建。
(可选:当使用Kerberos时,需先在kdc1配置Hive的principal,同步keytab后再进行路径创建,参见下文Hive的KDC认证)
hdfs dfs -mkdir -p /usr/hive/warehouse
hdfs dfs -mkdir -p /hive/tmp
hdfs dfs -mkdir -p /usr/hive/log
hdfs dfs -chmod -R 755 /usr/hive
hdfs dfs -chmod -R 755 /hive/tmp
修改存储在 /usr/local/oushu/conf/hive 的hive-env.sh文件
export JAVA_HOME=/usr/java/default/jre
Hive的KDC认证(可选)#
如果开启Kerberos,则需要在所有Hive节点安装Kerberos客户端。
lava ssh -f ${HOME}/hivehost -e "yum install -y krb5-libs krb5-workstation"
创建principal和keytab
ssh kdc1
kadmin.local
为Hive进行KDC认证
# 为hive角色生成实例
addprinc -randkey hive/hive1@KDCSERVER.OUSHU.COM
addprinc -randkey hive/hive2@KDCSERVER.OUSHU.COM
addprinc -randkey HTTP/hive1@KDCSERVER.OUSHU.COM
addprinc -randkey HTTP/hive2@KDCSERVER.OUSHU.COM
addprinc -randkey hive@KDCSERVER.OUSHU.COM
# 为每个实例生成keytab文件
ktadd -k /etc/security/keytabs/hive.keytab hive/hive1@KDCSERVER.OUSHU.COM
ktadd -k /etc/security/keytabs/hive.keytab hive/hive2@KDCSERVER.OUSHU.COM
ktadd -k /etc/security/keytabs/hive.keytab hive@KDCSERVER.OUSHU.COM
ktadd -norandkey -k /etc/security/keytabs/hive.keytab HTTP/hive1@KDCSERVER.OUSHU.COM
ktadd -norandkey -k /etc/security/keytabs/hive.keytab HTTP/hive2@KDCSERVER.OUSHU.COM
# 退出
quit
在hive1分发并修改keytab文件的权限
ssh hive1
scp root@kdc1:/etc/security/keytabs/hive.keytab /etc/security/keytabs/hive.keytab
scp root@kdc1:/etc/security/keytabs/hdfs.keytab /etc/security/keytabs/hdfs.keytab
scp root@kdc1:/etc/security/keytabs/yarn.keytab /etc/security/keytabs/yarn.keytab
scp root@kdc1:/etc/krb5.conf /etc/krb5.conf
lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/hive.keytab =:/etc/security/keytabs/hive.keytab
lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/hdfs.keytab =:/etc/security/keytabs/hdfs.keytab
lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/yarn.keytab =:/etc/security/keytabs/yarn.keytab
lava scp -r -f ${HOME}/hivehost /etc/krb5.conf =:/etc/krb5.conf
lava ssh -f ${HOME}/hivehost -e 'chown hive /etc/security/keytabs/hive.keytab'
lava ssh -f ${HOME}/hivehost -e 'chmod 400 /etc/security/keytabs/hive.keytab'
配置#
元数据库配置#
修改在/usr/local/oushu/conf/hive/下的hive-site.xml,使Hive启用PG
<configuration>
<!--pg元数据库配置-->
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>org.postgresql.Driver</value>
<description>JDBC驱动名</description>
</property>
<property>
<name>hive.metastore.db.type</name>
<value>postgres</value>
</property>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:postgresql://datanode01:3306/hive_db</value>
<description>JDBC连接名</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>连接metastore数据库的用户名(pg创建)</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>{此处须配置Skylab PG的强密码}</value>
<description>连接metastore数据库的密码(pg创建)</description>
</property>
<property>
<name>hive.metastore.schema.verification</name>
<value>false</value>
<description>强制metastore schema的版本一致性</description>
</property>
</configuration>
Hive基础配置#
修改/usr/local/oushu/conf/hive的hive-site.xml文件
<configuration>
<property>
<name>hive.exec.local.scratchdir</name>
<value>/data1/hdfs/hive/hdfs</value>
<description>hive的本地临时目录,用来存储不同阶段的map/reduce的执行计划</description>
</property>
<property>
<name>hive.downloaded.resources.dir</name>
<value>/data1/hdfs/hive/${hive.session.id}_resources</value>
<description>hive下载的本地临时目录</description>
</property>
<property>
<name>hive.querylog.location</name>
<value>/data1/hdfs/hive/hdfs</value>
<description>hive运行时结构化日志路径</description>
</property>
<property>
<name>hive.server2.logging.operation.log.location</name>
<value>/data1/hdfs/hive/hdfs/operation_logs</value>
<description>日志开启时的,操作日志路径</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/usr/hive/warehouse</value>
<description>Hive数据仓库在HDFS中的路径</description>
</property>
<property>
<name>hive.metastore.warehouse.external.dir</name>
<value></value>
</property>
<!-- HA -->
<property>
<name>hive.server2.support.dynamic.service.discovery</name>
<value>true</value>
</property>
<property>
<name>hive.server2.zookeeper.namespace</name>
<value>hiveserver2_zk</value>
</property>
<property>
<name>hive.zookeeper.quorum</name>
<value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value>
</property>
<property>
<name>hive.zookeeper.client.port</name>
<value>2181</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive1:9083,thrift://hive2:9083</value>
<description>远程metastore的 Thrift URI,以供metastore客户端连接metastore服务端</description>
</property>
</configuration>
分发配置到hive2
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/
登录到hive2,并修改/usr/local/oushu/conf/hive/下的hive-site.xml
<property>
<name>hive.server2.thrift.bind.host</name>
<value>hive2</value>
</property>
Hive调优(可选)#
一般推荐Hive使用默认参数运行,如果期待调优,建议优先调整Hive使用的资源,具体参考YARN章节YARN 安装下的”配置调优(可选)”部分。
Kerberos配置(可选)#
在hive1节点下
修改在/usr/local/oushu/conf/hive的hive-env.sh文件
export CLIENT_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/oushu/conf/zookeeper/client-jaas.conf"
如果本机没有ZooKeeper部署,需要本地同步ZooKeeper的keytab并创建client-jaas.conf文件,具体参考 ZooKeeper 安装。
如果Hive 部署模式为HA + Kerberos模式,需要先在Zookeeper客户端创建Hive路径
sudo -u zookeeper /usr/local/oushu/zookeeper/bin/zkCil.sh
[zk: localhost:2181(CONNECTED) 1] create /hiveserver2_zk
修改在/usr/local/oushu/conf/hive的hive-site.xml文件
<property>
<name>hive.server2.enable.doAs</name>
<value>true</value>
</property>
<property>
<name>hive.server2.authentication</name>
<value>KERBEROS</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.principal</name>
<value>hive/_HOST@KDCSERVER.OUSHU.COM</value>
</property>
<property>
<name>hive.server2.authentication.kerberos.keytab</name>
<value>/etc/security/keytabs/hive.keytab</value>
</property>
<property>
<name>hive.metastore.sasl.enabled</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.kerberos.keytab.file</name>
<value>/etc/security/keytabs/hive.keytab</value>
</property>
<property>
<name>hive.metastore.kerberos.principal</name>
<value>hive/_HOST@KDCSERVER.OUSHU.COM</value>
</property>
同步Hive的Kerberos配置
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /usr/local/oushu/conf/zookeeper/'
lava ssh -f ${HOME}/hivehost -e 'chmod -R 755 /usr/local/oushu/conf/zookeeper/'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /usr/local/oushu/conf/zookeeper/'
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/zookeeper/client-jaas.conf =:/usr/local/oushu/conf/zookeeper/
登录hdfs1
ssh hdfs1
su root
修改/usr/local/oushu/conf/common的core-site.xml文件,设置Hive的代理用户,修改后需要重启NameNode,DataNode
<configuration>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.HTTP.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.HTTP.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.users</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.users</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.users</name>
<value>*</value>
</property>
</configuration>
在hdfs1上创建hivehost:
touch ${HOME}/hivehost
配置hivehost内容为Hive有依赖的所有的节点hostname:
hive1
hive2
在hdfs1上创建yarnhost:
touch ${HOME}/yarnhost
配置yarnhost内容为Hive有依赖的所有的节点hostname:
yarn1
yarn2
yarn3
在hdfs1机器上和集群节点交换公钥,以便ssh免密码登陆和分发配置文件
# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hivehost -p ********
lava ssh-exkeys -f ${HOME}/yarnhost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hivehost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
lava scp -f ${HOME}/yarnhost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
修改完HDFS配置文件后需要同步到HDFS所有节点,并重启HDFS集群
如果没有对core-site等HDFS、YARN相关配置文件进行修改,则无需重启集群服务使参数生效。
lava scp -r -f ${HOME}/hdfshost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/core-site.xml =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/hive/
# 重启HDFS集群
lava ssh -f ${HOME}/nnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop namenode'
lava ssh -f ${HOME}/dnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop datanode'
lava ssh -f ${HOME}/jnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop journalnode'
lava ssh -f ${HOME}/nnhostfile -e 'sudo -E -u hdfs hdfs --daemon start namenode'
lava ssh -f ${HOME}/dnhostfile -e 'sudo -E -u hdfs hdfs --daemon start datanode'
lava ssh -f ${HOME}/jnhostfile -e 'sudo -E -u hdfs hdfs --daemon start journalnode'
# 重启YARN集群
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop nodemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop resourcemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start nodemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start resourcemanager'
启动#
元数据库#
在hive1节点使用root用户执行下面命令创建Hive元数据库
ssh PG1
psql -d postgres -h hive1 -p 4432 -U root -Atc "create database hive_db;"
初始化Hive元数据
ssh hive1
source /usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/hive/bin/schematool -dbType postgres -initSchema
Hive启动#
如果是kerberos+HA模式启动Hive,需要先在Zookeeper上创建HA需要的路径,防止Hive自身启动时使用带Kerberos权限的用户创建路径造成HA注册失败。
其中&host+port和&hive.server2.zookeeper.namespace分别为Zookeeper集群任一节点地址端口和hive-site中设置的HA路径。
su hive
/usr/local/oushu/hive/bin/zkCli.sh -server &host+port create /&hive.server2.zookeeper.namespace
启动Hive
su hive
lava ssh -f /root/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f /root/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'
检查状态#
登录zookeeper1机器
ssh zookeeper1
su zookeeper
# 进入zookeeper客户端并检查HA是否注册
/usr/local/oushu/zookeeper/bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 1] ls /hiveserver2_zk
[serverUri=VM-128-22-centos:10000;version=3.1.3;sequence=0000000001, serverUri=vm-128-22-centos:10000;version=3.1.3;sequence=0000000000]
执行sql测试hive是否可用
# 通过hive命令进入客户端
hive
hive:>create database td_test;
OK
Time taken:0.201 seconds
hive:>use td_test;
OK
hive:>create table test(id int);
OK
Time taken:0.234 seconds
hive:>insert into test values(1),(2);
OK
Time taken:14.73 seconds, Fetch:1 row(s)
hive:>select * from test;
OK
1
2
Time taken: 11.48 seconds, Fetched: 2 row(s)
注册到Skylab(可选)#
Kerberos将要安装的机器需要通过机器管理添加到skylab中,如果您尚未添加,请参考注册机器。
在hive1上修改/usr/local/oushu/lava/conf配置server.json
,替换localhost为skylab的服务器ip,具体skylab的基础服务lava安装步骤请参考:lava安装。
然后创建~/hive.json
文件,文件内容参考如下:
{
"data": {
"name": "HiveCluster",
"group_roles": [
{
"role": "hive.metastore",
"cluster_name": "metastore-id",
"group_name": "metastore",
"machines": [
{
"id": 1,
"name": "metastore1",
"subnet": "lava",
"data_ip": "192.168.1.11",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
},{
"id": 2,
"name": "metastore2",
"subnet": "lava",
"data_ip": "192.168.1.11",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
},
{
"role": "hive.hiveservice2",
"cluster_name": "hiveservice2-id",
"group_name": "hiveservice2",
"machines": [
{
"id": 1,
"name": "hiveservice2-1",
"subnet": "lava",
"data_ip": "192.168.1.11",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
},{
"id": 2,
"name": "hiveservice2-2",
"subnet": "lava",
"data_ip": "192.168.1.11",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
}
],
"config": {
"hive-env.sh": [
{
"key": "HIVE_HOME",
"value": "/usr/local/oushu/hive"
},
{
"key": "HIVE_CONF_DIR",
"value": "/usr/local/oushu/conf/hive"
},
{
"key": "HIVE_LOG_DIR",
"value": "/usr/local/oushu/log/hive"
},
{
"key": "HADOOP_CONF_DIR",
"value": "/usr/local/oushu/conf/hive"
}
],
"hive-site.xml": [
{
"key": "hive.exec.local.scratchdir",
"value": "/data1/hdfs/hive/hdfs"
},
{
"key": "hive.querylog.location",
"value": "/data1/hdfs/hive/hdfs"
},
{
"key": "hive.metastore.warehouse.dir",
"value": "/usr/hive/warehouse"
},
{
"key": "javax.jdo.option.ConnectionDriverName",
"value": "org.postgresql.Driver"
},
{
"key": "javax.jdo.option.ConnectionURL",
"value": "jdbc:postgresql://datanode01:3306/hive_db"
},
{
"key": "hive.server2.support.dynamic.service.discovery",
"value": "true"
},
{
"key": "hive.server2.zookeeper.namespace",
"value": "2181"
},{
"key": "hive.zookeeper.client.port",
"value": "2181"
},{
"key": "hive.zookeeper.quorum",
"value": "zookeeper1:2181,zookeeper2:2181,zookeeper3:2181"
},{
"key": "hive.metastore.uris",
"value": "thrift://hive1:9083,thrift://hive2:9083"
}
]
}
}
}
上述配置文件中,需要根据实际情况修改machines数组中的机器信息,在平台基础组件lava所安装的机器执行:
psql lavaadmin -p 4432 -U oushu -c "select m.id,m.name,s.name as subnet,m.private_ip as data_ip,m.public_ip as manage_ip,m.assist_port,m.ssh_port from machine as m,subnet as s where m.subnet_id=s.id;"
获取到所需的机器信息,根据服务角色对应的节点,将机器信息添加到machines数组中。
例如hive1对应的Hive MetaStore角色,hive1的机器信息需要备添加到hive.metastore角色对应的machines数组中。
调用lava命令注册集群:
lava login -u oushu -p ********
lava onprem-register service -s Hive -f ~/hive.json
如果返回值为:
Add service by self success
则表示注册成功,如果有错误信息,请根据错误信息处理。
同时,从页面登录后,在自动部署模块对应服务中可以查看到新添加的集群。
Hive集成Ranger认证(可选)#
Ranger安装#
如果开启Ranger,则需要在所有Hive节点安装Ranger客户端。
lava ssh -f ${HOME}/hivehost -e "yum install -y ranger-hive-plugin"
lava ssh -f ${HOME}/hivehost -e "ln -s /usr/local/oushu/conf/hive /usr/local/oushu/hive/conf"
Ranger配置#
在hive1节点下修改配置文件/usr/local/oushu/ranger-hive-plugin_2.3.0/install.properties
POLICY_MGR_URL=http://ranger1:6080
REPOSITORY_NAME=hivedev
COMPONENT_INSTALL_DIR_NAME=/usr/local/oushu/hive
确认配置文件/usr/local/oushu/conf/hive/hive-site.xml中配置了以下参数
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive1:9083,thrift://hive2:9083</value>
<description>远程metastore的 Thrift URI,以供metastore客户端连接metastore服务端</description>
</property>
确认已经修改/usr/local/oushu/conf/common/core-site.xml文件中的代理用户,修改需要并同步至所有HDFS节点并重启NameNode,DataNode 登录hdfs1
ssh hdfs1
su root
修改/usr/local/oushu/conf/common的core-site.xml文件,设置Hive的代理用户
<configuration>
<property>
<name>hadoop.proxyuser.hdfs.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.HTTP.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.HTTP.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hive.users</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hdfs.users</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.users</name>
<value>*</value>
</property>
</configuration>
同步配置
lava scp -r -f ${HOME}/hdfshost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/core-site.xml =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/hive/
同步Hive的Ranger配置,并执行初始化配置脚本
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/ranger-hive-plugin_2.3.0/install.properties =:/usr/local/oushu/ranger-hive-plugin_2.3.0/
lava ssh -f ${HOME}/hivehost -e '/usr/local/oushu/ranger-hive-plugin_2.3.0/enable-hive-plugin.sh'
执行完初始化脚本后,看到如下信息说明成功,并按照要求重启服务。
Ranger Plugin for hive has been enabled. Please restart hive to ensure that changes are effective.
重新启动Hive
# Hive进程只能采用kill -9 pid的方式关闭
su hive
jps
* 3432 RunJar
* 2987 RunJar
kill -9 3432
kill -9 2987
exit
# 上述命令需要登录到hive2机器再执行一遍
ssh hive2
su - hive
jps
* 3433 RunJar
* 2988 RunJar
kill -9 3433
kill -9 2988
exit
# 回到hive1启动Hive
ssh hive1
su hive
source /usr/local/oushu/conf/hive/hive-env.sh
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'
在rangerUI 上配置用户权限策略#
创建Hive Service
服务#
登陆rangerUI http://ranger1:6080,点击➕号添加
Hive Service
,注意选择的标签是”HADOOP SQL”填写服务名,注意需要和
install.properties
文件里的REPOSITORY_NAME
名称保持一致用户名、密码自定义,写入Hive的链接方式,若kerberos开启了kerberos认证,则填写相应的keytab文件,否则使用默认配置
运行测试查看是否配置正确,正确后点击添加保存。
创建访问测量#
找到刚刚创建的服务,点击名称
点击’Add New Policy’按钮
设置访问策略,使得hive用户在’t1’下的只有读权限,同时,要确保 recursive 滑块处于开启状态
查看刚刚设置
Ranger + Kerberos 注意项#
当开启Kerberos配置时,需要对Ranger服务也开启Kerberos,同时在配置Hive repo时,加入参数如下:
参数值为配置的Kerberos实体用户名。
检查效果#
登陆hive1机器,使用hive用户访问
sudo su hive
source /usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/hive/bin/beeline
!connect jdbc:hive2://oushu162509m1-4424-4424-1:10000
出下如下信息,证明生效(策略配置完可能需要一分钟生效,可以过会再试)
> use test;
> selcet * from t1;
OK
+---------+
|test.id |
+---------+
| 1 |
+---------+
1 row(s) selected(0.18 seconds)
> inset into t1 values(1);
Permission denied: user [hive] does not have [write] privilega on [t1]
Hive on Tez#
前提#
完成前文Hive部署,可以不开启Kerberos认证。
安装#
Tez由两个包组成,分别为:tez-minimal.tar和tez.tar,通过tar下载Tez到本地。
下载Tez安装包:
sudo su root
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /usr/local/oushu/tez'
lava ssh -f ${HOME}/hivehost -e 'wget $获取两个tarball的 url -O /usr/local/oushu/tez/tez-0.10.1-minimal.tar.gz'
lava ssh -f ${HOME}/hivehost -e 'wget $获取两个tarball的 url -O /usr/local/oushu/tez/tez-0.10.1.tar.gz'
本地解压tez-0.10.1.tar.gz
lava ssh -f ${HOME}/hivehost -e 'tar -zxvf /usr/local/oushu/tez/tez-0.10.1.tar.gz -C /usr/local/oushu/tez'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /usr/local/oushu/tez'
配置#
在/usr/local/oushu/conf/hive/下创建的tez配置文件tez-site.xml并修改:
<configuration>
<property>
<name>tez.lib.uris</name>
<value>/apps/tez/tez-0.10.1-minimal.tar.gz</value>
<!-- 这里指向hdfs上的tez.tar.gz包 -->
</property>
<property>
<name>tez.container.max.java.heap.fraction</name>
<!-- Tez的配置参数 -->
<value>0.2</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>tez.am.am-rm.heartbeat.interval-ms.max</name>
<value>250</value>
</property>
<property>
<name>tez.am.container.idle.release-timeout-max.millis</name>
<value>20000</value>
</property>
<property>
<name>tez.am.container.idle.release-timeout-min.millis</name>
<value>10000</value>
</property>
<property>
<name>tez.am.container.reuse.enabled</name>
<value>true</value>
</property>
<property>
<name>tez.am.container.reuse.locality.delay-allocation-millis</name>
<value>250</value>
</property>
<property>
<name>tez.am.container.reuse.non-local-fallback.enabled</name>
<value>false</value>
</property>
<property>
<name>tez.am.container.reuse.rack-fallback.enabled</name>
<value>true</value>
</property>
<property>
<name>tez.am.java.opts</name>
<value>-server -Xmx1024m -Djava.net.preferIPv4Stack=true</value>
</property>
<property>
<name>tez.am.launch.cluster-default.cmd-opts</name>
<value>-server -Djava.net.preferIPv4Stack=true</value>
</property>
<property>
<name>tez.am.launch.cmd-opts</name>
<value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB</value>
</property>
<property>
<name>tez.am.launch.env</name>
<value>LD_LIBRARY_PATH=/usr/local/oushu/hdfs/lib/native</value>
</property>
<property>
<name>tez.am.log.level</name>
<value>INFO</value>
</property>
<property>
<name>tez.am.max.app.attempts</name>
<value>2</value>
</property>
<property>
<name>tez.am.maxtaskfailures.per.node</name>
<value>10</value>
</property>
<property>
<name>tez.am.resource.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>tez.am.resource.cpu.vcores</name>
<value>2</value>
</property>
<property>
<name>tez.am.view-acls</name>
<value></value>
</property>
<property>
<name>tez.counters.max</name>
<value>10000</value>
</property>
<property>
<name>tez.counters.max.groups</name>
<value>3000</value>
</property>
<property>
<name>tez.generate.debug.artifacts</name>
<value>false</value>
</property>
<property>
<name>tez.grouping.max-size</name>
<value>1073741824</value>
</property>
<property>
<name>tez.grouping.min-size</name>
<value>16777216</value>
</property>
<property>
<name>tez.grouping.split-waves</name>
<value>1.7</value>
</property>
<property>
<name>tez.queue.name</name>
<value>default</value>
</property>
<property>
<name>tez.runtime.compress</name>
<value>true</value>
</property>
<property>
<name>tez.runtime.compress.codec</name>
<value>org.apache.hadoop.io.compress.SnappyCodec</value>
</property>
<property>
<name>tez.runtime.convert.user-payload.to.history-text</name>
<value>false</value>
</property>
<property>
<name>tez.runtime.io.sort.mb</name>
<value>512</value>
</property>
<property>
<name>tez.runtime.optimize.local.fetch</name>
<value>true</value>
</property>
<property>
<name>tez.runtime.pipelined.sorter.sort.threads</name>
<value>1</value>
</property>
<property>
<name>tez.runtime.shuffle.memory.limit.percent</name>
<value>0.25</value>
</property>
<property>
<name>tez.runtime.sorter.class</name>
<value>PIPELINED</value>
</property>
<property>
<name>tez.runtime.unordered.output.buffer.size-mb</name>
<value>76</value>
</property>
<property>
<name>tez.session.am.dag.submit.timeout.secs</name>
<value>600</value>
</property>
<property>
<name>tez.session.client.timeout.secs</name>
<value>-1</value>
</property>
<property>
<name>tez.shuffle-vertex-manager.max-src-fraction</name>
<value>0.4</value>
</property>
<property>
<name>tez.shuffle-vertex-manager.min-src-fraction</name>
<value>0.2</value>
</property>
<property>
<name>tez.staging-dir</name>
<value>/tmp/${user.name}/staging</value>
</property>
<property>
<name>tez.task.am.heartbeat.counter.interval-ms.max</name>
<value>4000</value>
</property>
<property>
<name>tez.task.generate.counters.per.io</name>
<value>true</value>
</property>
<property>
<name>tez.task.get-task.sleep.interval-ms.max</name>
<value>200</value>
</property>
<property>
<name>tez.task.launch.cluster-default.cmd-opts</name>
<value>-server -Djava.net.preferIPv4Stack=true</value>
</property>
<property>
<name>tez.task.launch.cmd-opts</name>
<value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB</value>
</property>
<property>
<name>tez.task.launch.env</name>
<value>LD_LIBRARY_PATH=/usr/local/oushu/hdfs/lib/native</value>
</property>
<property>
<name>tez.task.max-events-per-heartbeat</name>
<value>500</value>
</property>
<property>
<name>tez.task.resource.memory.mb</name>
<value>1024</value>
</property>
<property>
<name>tez.use.cluster.hadoop-libs</name>
<value>true</value>
</property>
<property>
<name>yarn.timeline-service.enabled</name>
<value>false</value>
</property>
<property>
<name>hive.tez.container.size</name>
<value>2048</value>
</property>
</configuration>
同步tez到HDFS机器:
lava scp -r -f hdfs1 /usr/local/oushu/tez/tez-0.10.1-minimal.tar.gz =:/usr/local/oushu/hdfs/
# 登录hdfs1机器并上传到HDFS
ssh hdfs1
su hdfs
hdfs dfs -mkdir -p /apps/tez
hdfs dfs -copyFromLocal /usr/local/oushu/hdfs/tez-0.10.1-minimal.tar.gz /apps/tez
# 退出hdfs用户并回到hive1
exit
exit
修改 /usr/local/oushu/conf/hive/hive-site.xml使Hive用Tez
<property>
<name>hive.execution.engine</name>
<value>tez</value>
</property>
修改使用YARN引擎
修改/usr/local/oushu/conf/common下的mapred-site.xml
<property>
<name>mapreduce.framework.name</name>
<value>yarn-tez</value>
</property>
修改环境变量
export TEZ_CONF_DIR=/usr/local/oushu/conf/hive/tez-site.xml
export TEZ_JARS=/usr/local/oushu/tez/
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*
# 将上方的环境变量信息追加到下方文件
# 如果Hive和HDFS为分离部署,需要将tez-site.xml复制到HDFS配置目录下
/usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/conf/common/hadoop-env.sh
启动#
# 重新启动Hive
# Hive进程只能采用kill -9 pid的方式关闭
su hive
jps
* 3432 RunJar
* 2987 RunJar
kill -9 3432
kill -9 2987
exit
# 上述命令需要登录到hive2机器再执行一遍
ssh hive2
su - hive
jps
* 3433 RunJar
* 2988 RunJar
kill -9 3433
kill -9 2988
exit
# 回到hive1启动Hive
ssh hive1
su hive
source /usr/local/oushu/conf/hive/hive-env.sh
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'
检查状态#
# 进入hive客户端
hive
# 测试是否可用
hive:>create database td_test;
OK
Time taken:0.201 seconds
hive:>use td_test;
OK
hive:>create table test(id int);
OK
Time taken:0.234 seconds
hive:>insert into test values(1),(2);
OK
Time taken:14.73 seconds, Fetch:1 row(s)
hive:>select * from test;
Query ID = hive_20221110150743_4155afab-4bfa-4e8a-acb0-90c8c50ecfb5
Total jobs = 1
Launching Job 1 out of 1
Status: Running (Executing on YARN cluster with App id application_1478229439699_0007)
--------------------------------------------------------------------------------
VERTICES STATUS TOTAL COMPLETED RUNNING PENDING FAILED KILLED
--------------------------------------------------------------------------------
Map 1 .......... SUCCEEDED 2 2 0 0 0 0
Reducer 2 ...... SUCCEEDED 1 1 0 0 0 0
--------------------------------------------------------------------------------
VERTICES: 02/02 [==========================>>] 100% ELAPSED TIME: 10.19 s
--------------------------------------------------------------------------------
OK
1 oushu
2 hive
Time taken: 11.48 seconds, Fetched: 2 row(s)
# 上边表格出现表明使用率Tez引擎
安装Hive Client安装#
如果需要在并没有部署Hive的机器使用Hive命令,需要安装Hive Client端和HDFS Client端
Hive Client地址假定为hive3,hive4,hive5
准备#
在hive1机器创建hiveclienthost
su root
touch ${HOME}/hiveclienthost
添加下面主机名到hiveclienthost:
hive3
hive4
hive5
交换公钥,以便ssh免密码登陆和分发配置文件
# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hiveclienthost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hiveclienthost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
安装#
lava ssh -f ${HOME}/hiveclienthost -e 'yum install -y hive'
lava ssh -f ${HOME}/hiveclienthost -e 'yum install -y hdfs mapreduce yarn'
lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hdfs:hadoop /usr/local/oushu/conf/common/'
lava scp -r -f ${HOME}/hiveclienthost /usr/local/oushu/conf/common/* =:/usr/local/oushu/conf/common/
lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hive:hadoop /usr/local/oushu/conf/hive/'
lava scp -r -f ${HOME}/hiveclienthost /usr/local/oushu/conf/hive/* =:/usr/local/oushu/conf/hive/
lava ssh -f ${HOME}/hiveclienthost -e 'sudo mkdir -p /data1/hdfs/hive/'
lava ssh -f ${HOME}/hiveclienthost -e 'chown -R hive:hadoop /data1/hdfs/hive/'
检查#
ssh hive3
su hive
# 进入hive客户端
hive
# 测试是否可用
hive:>create database td_test;
OK
Time taken:0.201 seconds
# 有返回即证明客户端生效