命令行安装#


Hive HA部署#

前提#

Hive需要依赖HDFS、YARN、ZooKeeper集群,元数据存储使用PG。

如果 Hive 以 HA+Kerberos模式部署,则需要ZooKeeper开启Kerberos认证。

ZooKeeper安装部署请参考:ZooKeeper 安装

ZooKeeper服务地址假定为zookeeper1:2181,zookeeper2:2181,zookeeper3:2181

YARN安装部署请参考:YARN 安装

YARN服务地址假定为yarn1:8090,yarn2:8090,yarn3:8090

HDFS安装部署请参考:HDFS 安装

HDFS服务地址假定为hdfs1:9000,hdfs2:9000,hdfs3:9000

Hive的安装需要依赖外部数据库做元数据存储,默认使用Skylab平台本身的Postgres数据库。

Postgres地址假定为PG1

若需要配置Kerberos认证,那么需要提前部署好KDC服务:Kerberos 安装

KDC服务地址假定为kdc1

如Hive与HDFS/YARN集群分离部署,则需要在所以Hive机器上安装HDFS Client并同步HDFS配置文件。

若开启Ranger认证,Ranger安装部署请参考:Ranger 安装

Ranger 服务地址假定为ranger1

配置yum源并安装lava#

登录hive1机器,然后切换到root用户

ssh hive1
su root

配置yum源,安装lava命令行管理工具

# 从yum源所在机器(假设为192.168.1.10)获取repo文件
scp root@192.168.1.10:/etc/yum.repos.d/oushu.repo /etc/yum.repos.d/oushu.repo
# 追加yum源所在机器信息到/etc/hosts文件
# 安装lava命令行管理工具
yum clean all
yum makecache
yum install -y lava

创建hivehost:

touch ${HOME}/hivehost

配置hivehost内容为Hive有依赖的所有的节点hostname:

hive1
hive2

修改权限:

chmod 777 ${HOME}/hivehost

在首台机器上和集群内其他节点交换公钥,以便ssh免密码登陆和分发配置文件

# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hivehost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hivehost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d

安装#

准备#

lava ssh -f ${HOME}/hivehost -e 'yum install -y hive'
# 如果Hive为分离部署,则需要安装HDFS客户端(可选)
lava ssh -f ${HOME}/hivehost -e 'yum install -y hdfs'

创建Hive路径,赋予hive用户权限

lava ssh -f ${HOME}/hivehost -e 'mkdir -p /data1/hdfs/hive/hdfs'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /data1/hdfs/hive'
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /etc/security/keytabs/'

其中参数:hive.metastore.warehouse.dir指定的路径需要在HDFS中创建。

(可选:当使用Kerberos时,需先在kdc1配置Hive的principal,同步keytab后再进行路径创建,参见下文Hive的KDC认证)

hdfs dfs -mkdir -p /usr/hive/warehouse  
hdfs dfs -mkdir -p /hive/tmp  
hdfs dfs -mkdir -p /usr/hive/log  
hdfs dfs -chmod -R 755 /usr/hive
hdfs dfs -chmod -R 755 /hive/tmp  

修改存储在 /usr/local/oushu/conf/hive 的hive-env.sh文件

export JAVA_HOME=/usr/java/default/jre
Hive的KDC认证(可选)#

如果开启Kerberos,则需要在所有Hive节点安装Kerberos客户端。

lava ssh -f ${HOME}/hivehost -e "yum install -y krb5-libs krb5-workstation"

创建principal和keytab

ssh kdc1
kadmin.local

为Hive进行KDC认证

# 为hive角色生成实例
addprinc -randkey hive/hive1@KDCSERVER.OUSHU.COM 
addprinc -randkey hive/hive2@KDCSERVER.OUSHU.COM 
addprinc -randkey HTTP/hive1@KDCSERVER.OUSHU.COM
addprinc -randkey HTTP/hive2@KDCSERVER.OUSHU.COM
addprinc -randkey hive@KDCSERVER.OUSHU.COM
# 为每个实例生成keytab文件 
ktadd -k /etc/security/keytabs/hive.keytab hive/hive1@KDCSERVER.OUSHU.COM
ktadd -k /etc/security/keytabs/hive.keytab hive/hive2@KDCSERVER.OUSHU.COM
ktadd -k /etc/security/keytabs/hive.keytab hive@KDCSERVER.OUSHU.COM
ktadd -norandkey -k /etc/security/keytabs/hive.keytab HTTP/hive1@KDCSERVER.OUSHU.COM
ktadd -norandkey -k /etc/security/keytabs/hive.keytab HTTP/hive2@KDCSERVER.OUSHU.COM
# 退出
quit

在hive1分发并修改keytab文件的权限

ssh hive1
scp root@kdc1:/etc/security/keytabs/hive.keytab /etc/security/keytabs/hive.keytab
scp root@kdc1:/etc/security/keytabs/hdfs.keytab /etc/security/keytabs/hdfs.keytab
scp root@kdc1:/etc/security/keytabs/yarn.keytab /etc/security/keytabs/yarn.keytab
scp root@kdc1:/etc/krb5.conf  /etc/krb5.conf 

lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/hive.keytab =:/etc/security/keytabs/hive.keytab
lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/hdfs.keytab =:/etc/security/keytabs/hdfs.keytab
lava scp -r -f ${HOME}/hivehost /etc/security/keytabs/yarn.keytab =:/etc/security/keytabs/yarn.keytab
lava scp -r -f ${HOME}/hivehost /etc/krb5.conf  =:/etc/krb5.conf 
lava ssh -f ${HOME}/hivehost -e 'chown hive /etc/security/keytabs/hive.keytab'
lava ssh -f ${HOME}/hivehost -e 'chmod 400 /etc/security/keytabs/hive.keytab'

配置#

元数据库配置#

修改在/usr/local/oushu/conf/hive/下的hive-site.xml,使Hive启用PG

<configuration>
<!--pg元数据库配置-->
   <property>
     <name>javax.jdo.option.ConnectionDriverName</name>
     <value>org.postgresql.Driver</value>
     <description>JDBC驱动名</description>
   </property>
  
    <property>
      <name>hive.metastore.db.type</name>
      <value>postgres</value>
    </property>
  
   <property>
     <name>javax.jdo.option.ConnectionURL</name>
     <value>jdbc:postgresql://datanode01:3306/hive_db</value>
     <description>JDBC连接名</description>
   </property>

   <property>
     <name>javax.jdo.option.ConnectionUserName</name>
     <value>hive</value>
     <description>连接metastore数据库的用户名(pg创建)</description>
   </property>

   <property>
     <name>javax.jdo.option.ConnectionPassword</name>
     <value>{此处须配置Skylab PG的强密码}</value>
     <description>连接metastore数据库的密码(pg创建)</description>
   </property>

   <property>
     <name>hive.metastore.schema.verification</name>
     <value>false</value>
     <description>强制metastore schema的版本一致性</description>
   </property>
</configuration>

Hive基础配置#

修改/usr/local/oushu/conf/hive的hive-site.xml文件

<configuration>
<property>
     <name>hive.exec.local.scratchdir</name>
     <value>/data1/hdfs/hive/hdfs</value>
     <description>hive的本地临时目录,用来存储不同阶段的map/reduce的执行计划</description>
   </property>

   <property>
     <name>hive.downloaded.resources.dir</name>
     <value>/data1/hdfs/hive/${hive.session.id}_resources</value>
     <description>hive下载的本地临时目录</description>
   </property>

   <property>
     <name>hive.querylog.location</name>
     <value>/data1/hdfs/hive/hdfs</value>
     <description>hive运行时结构化日志路径</description>
   </property>

   <property>
     <name>hive.server2.logging.operation.log.location</name>
     <value>/data1/hdfs/hive/hdfs/operation_logs</value>
     <description>日志开启时的,操作日志路径</description>
   </property>

  <property>
    <name>hive.metastore.warehouse.dir</name>
    <value>/usr/hive/warehouse</value>
    <description>Hive数据仓库在HDFS中的路径</description>
  </property>
    <property>
      <name>hive.metastore.warehouse.external.dir</name>
      <value></value>
    </property>
    
    <!-- HA    -->
    <property>
        <name>hive.server2.support.dynamic.service.discovery</name>
        <value>true</value>
    </property>
     
    <property>
        <name>hive.server2.zookeeper.namespace</name>
        <value>hiveserver2_zk</value>
    </property>
    
    <property>
        <name>hive.zookeeper.quorum</name>
        <value>zookeeper1:2181,zookeeper2:2181,zookeeper3:2181</value>
    </property>
     
    <property>
        <name>hive.zookeeper.client.port</name>
        <value>2181</value>
    </property>

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hive1:9083,thrift://hive2:9083</value>
        <description>远程metastore的 Thrift URI,以供metastore客户端连接metastore服务端</description>
    </property>

</configuration>

分发配置到hive2

lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/hive/*  =:/usr/local/oushu/conf/hive/

登录到hive2,并修改/usr/local/oushu/conf/hive/下的hive-site.xml

  <property>
 	<name>hive.server2.thrift.bind.host</name>
  	<value>hive2</value>
  </property>
Hive调优(可选)#

一般推荐Hive使用默认参数运行,如果期待调优,建议优先调整Hive使用的资源,具体参考YARN章节YARN 安装下的”配置调优(可选)”部分。

Kerberos配置(可选)#

在hive1节点下

修改在/usr/local/oushu/conf/hive的hive-env.sh文件

export CLIENT_JVMFLAGS="-Djava.security.auth.login.config=/usr/local/oushu/conf/zookeeper/client-jaas.conf"

如果本机没有ZooKeeper部署,需要本地同步ZooKeeper的keytab并创建client-jaas.conf文件,具体参考 ZooKeeper 安装

如果Hive 部署模式为HA + Kerberos模式,需要先在Zookeeper客户端创建Hive路径

sudo -u zookeeper /usr/local/oushu/zookeeper/bin/zkCil.sh 

[zk: localhost:2181(CONNECTED) 1] create /hiveserver2_zk

修改在/usr/local/oushu/conf/hive的hive-site.xml文件

<property>
       <name>hive.server2.enable.doAs</name>
       <value>true</value>
 </property>
 <property>
       <name>hive.server2.authentication</name>
       <value>KERBEROS</value>
 </property>
 <property>
       <name>hive.server2.authentication.kerberos.principal</name>
       <value>hive/_HOST@KDCSERVER.OUSHU.COM</value>
 </property>
 <property>
       <name>hive.server2.authentication.kerberos.keytab</name>
       <value>/etc/security/keytabs/hive.keytab</value>
 </property>
 <property>
       <name>hive.metastore.sasl.enabled</name>
       <value>true</value>
 </property>
 <property>
       <name>hive.metastore.kerberos.keytab.file</name>
       <value>/etc/security/keytabs/hive.keytab</value>
 </property>
 <property>
       <name>hive.metastore.kerberos.principal</name>
       <value>hive/_HOST@KDCSERVER.OUSHU.COM</value>
 </property>

同步Hive的Kerberos配置

lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/hive/*  =:/usr/local/oushu/conf/hive/
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /usr/local/oushu/conf/zookeeper/'
lava ssh -f ${HOME}/hivehost -e 'chmod -R 755 /usr/local/oushu/conf/zookeeper/'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /usr/local/oushu/conf/zookeeper/'
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/zookeeper/client-jaas.conf  =:/usr/local/oushu/conf/zookeeper/

登录hdfs1

ssh hdfs1
su root

修改/usr/local/oushu/conf/common的core-site.xml文件,设置Hive的代理用户,修改后需要重启NameNode,DataNode

<configuration>
<property>
  <name>hadoop.proxyuser.hdfs.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hdfs.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hive.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hive.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.HTTP.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.HTTP.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hive.users</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hdfs.users</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.users</name>
  <value>*</value>
</property>
</configuration>

在hdfs1上创建hivehost:

touch ${HOME}/hivehost

配置hivehost内容为Hive有依赖的所有的节点hostname:

hive1
hive2

在hdfs1上创建yarnhost:

touch ${HOME}/yarnhost

配置yarnhost内容为Hive有依赖的所有的节点hostname:

yarn1
yarn2
yarn3

在hdfs1机器上和集群节点交换公钥,以便ssh免密码登陆和分发配置文件

# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hivehost -p ********
lava ssh-exkeys -f ${HOME}/yarnhost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hivehost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
lava scp -f ${HOME}/yarnhost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d

修改完HDFS配置文件后需要同步到HDFS所有节点,并重启HDFS集群

如果没有对core-site等HDFS、YARN相关配置文件进行修改,则无需重启集群服务使参数生效。

lava scp -r -f ${HOME}/hdfshost /usr/local/oushu/conf/common/*  =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/core-site.xml  =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/common/*  =:/usr/local/oushu/conf/hive/
# 重启HDFS集群
lava ssh -f ${HOME}/nnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop namenode'
lava ssh -f ${HOME}/dnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop datanode'
lava ssh -f ${HOME}/jnhostfile -e 'sudo -E -u hdfs hdfs --daemon stop journalnode'

lava ssh -f ${HOME}/nnhostfile -e 'sudo -E -u hdfs hdfs --daemon start namenode'
lava ssh -f ${HOME}/dnhostfile -e 'sudo -E -u hdfs hdfs --daemon start datanode'
lava ssh -f ${HOME}/jnhostfile -e 'sudo -E -u hdfs hdfs --daemon start journalnode'

# 重启YARN集群
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop nodemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon stop resourcemanager'

lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start nodemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start resourcemanager'

启动#

元数据库#

在hive1节点使用root用户执行下面命令创建Hive元数据库

ssh PG1
psql -d postgres -h hive1 -p 4432 -U root -Atc "create database hive_db;"

初始化Hive元数据

ssh hive1
source /usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/hive/bin/schematool -dbType postgres -initSchema

Hive启动#

如果是kerberos+HA模式启动Hive,需要先在Zookeeper上创建HA需要的路径,防止Hive自身启动时使用带Kerberos权限的用户创建路径造成HA注册失败。

其中&host+port和&hive.server2.zookeeper.namespace分别为Zookeeper集群任一节点地址端口和hive-site中设置的HA路径。

su hive
/usr/local/oushu/hive/bin/zkCli.sh -server &host+port create /&hive.server2.zookeeper.namespace 

启动Hive

su hive
lava ssh -f /root/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f /root/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'

检查状态#

登录zookeeper1机器

ssh zookeeper1
su zookeeper
# 进入zookeeper客户端并检查HA是否注册
/usr/local/oushu/zookeeper/bin/zkCli.sh 
[zk: localhost:2181(CONNECTED) 1] ls /hiveserver2_zk

[serverUri=VM-128-22-centos:10000;version=3.1.3;sequence=0000000001, serverUri=vm-128-22-centos:10000;version=3.1.3;sequence=0000000000]

执行sql测试hive是否可用

# 通过hive命令进入客户端
hive

hive:>create database td_test;
OK 
Time taken:0.201 seconds

hive:>use td_test;
OK 

hive:>create table test(id int);
OK 
Time taken:0.234 seconds

hive:>insert into test values(1),(2);
OK 
Time taken:14.73 seconds, Fetch:1 row(s)

hive:>select * from test;
OK
1       
2       
Time taken: 11.48 seconds, Fetched: 2 row(s)

注册到Skylab(可选)#

Kerberos将要安装的机器需要通过机器管理添加到skylab中,如果您尚未添加,请参考注册机器

在hive1上修改/usr/local/oushu/lava/conf配置server.json,替换localhost为skylab的服务器ip,具体skylab的基础服务lava安装步骤请参考:lava安装

然后创建~/hive.json文件,文件内容参考如下:

{
    "data": {
        "name": "HiveCluster",
        "group_roles": [
            {
                "role": "hive.metastore",
                "cluster_name": "metastore-id",
                "group_name": "metastore",
                "machines": [
                    {
                        "id": 1,
                        "name": "metastore1",
                        "subnet": "lava",
                        "data_ip": "192.168.1.11",
                        "manage_ip": "",
                        "assist_port": 1622,
                        "ssh_port": 22
                    },{
                    "id": 2,
                    "name": "metastore2",
                    "subnet": "lava",
                    "data_ip": "192.168.1.11",
                    "manage_ip": "",
                    "assist_port": 1622,
                    "ssh_port": 22
                  }
                ]
            },
          {
            "role": "hive.hiveservice2",
            "cluster_name": "hiveservice2-id",
            "group_name": "hiveservice2",
            "machines": [
              {
                "id": 1,
                "name": "hiveservice2-1",
                "subnet": "lava",
                "data_ip": "192.168.1.11",
                "manage_ip": "",
                "assist_port": 1622,
                "ssh_port": 22
              },{
                "id": 2,
                "name": "hiveservice2-2",
                "subnet": "lava",
                "data_ip": "192.168.1.11",
                "manage_ip": "",
                "assist_port": 1622,
                "ssh_port": 22
              }
            ]
          }
        ],
        "config": {
            "hive-env.sh": [
                {
                    "key": "HIVE_HOME",
                    "value": "/usr/local/oushu/hive"
                },
                {
                    "key": "HIVE_CONF_DIR",
                    "value": "/usr/local/oushu/conf/hive"
                },
              {
                "key": "HIVE_LOG_DIR",
                "value": "/usr/local/oushu/log/hive"
              },
              {
                "key": "HADOOP_CONF_DIR",
                "value": "/usr/local/oushu/conf/hive"
              }
            ],
          "hive-site.xml": [
            {
              "key": "hive.exec.local.scratchdir",
              "value": "/data1/hdfs/hive/hdfs"
            },
            {
              "key": "hive.querylog.location",
              "value": "/data1/hdfs/hive/hdfs"
            },
            {
              "key": "hive.metastore.warehouse.dir",
              "value": "/usr/hive/warehouse"
            },
            {
              "key": "javax.jdo.option.ConnectionDriverName",
              "value": "org.postgresql.Driver"
            },
            {
              "key": "javax.jdo.option.ConnectionURL",
              "value": "jdbc:postgresql://datanode01:3306/hive_db"
            },
            {
              "key": "hive.server2.support.dynamic.service.discovery",
              "value": "true"
            },
            {
              "key": "hive.server2.zookeeper.namespace",
              "value": "2181"
            },{
              "key": "hive.zookeeper.client.port",
              "value": "2181"
            },{
              "key": "hive.zookeeper.quorum",
              "value": "zookeeper1:2181,zookeeper2:2181,zookeeper3:2181"
            },{
              "key": "hive.metastore.uris",
              "value": "thrift://hive1:9083,thrift://hive2:9083"
            }
          ]
        }
    }
}

上述配置文件中,需要根据实际情况修改machines数组中的机器信息,在平台基础组件lava所安装的机器执行:

psql lavaadmin -p 4432 -U oushu -c "select m.id,m.name,s.name as subnet,m.private_ip as data_ip,m.public_ip as manage_ip,m.assist_port,m.ssh_port from machine as m,subnet as s where m.subnet_id=s.id;"

获取到所需的机器信息,根据服务角色对应的节点,将机器信息添加到machines数组中。

例如hive1对应的Hive MetaStore角色,hive1的机器信息需要备添加到hive.metastore角色对应的machines数组中。

调用lava命令注册集群:

lava login -u oushu -p ********
lava onprem-register service -s Hive -f ~/hive.json

如果返回值为:

Add service by self success

则表示注册成功,如果有错误信息,请根据错误信息处理。

同时,从页面登录后,在自动部署模块对应服务中可以查看到新添加的集群。

Hive集成Ranger认证(可选)#

Ranger安装#

如果开启Ranger,则需要在所有Hive节点安装Ranger客户端。

lava ssh -f ${HOME}/hivehost -e "yum install -y ranger-hive-plugin"
lava ssh -f ${HOME}/hivehost -e "ln -s /usr/local/oushu/conf/hive /usr/local/oushu/hive/conf"

Ranger配置#

在hive1节点下修改配置文件/usr/local/oushu/ranger-hive-plugin_2.3.0/install.properties

POLICY_MGR_URL=http://ranger1:6080
REPOSITORY_NAME=hivedev
COMPONENT_INSTALL_DIR_NAME=/usr/local/oushu/hive

确认配置文件/usr/local/oushu/conf/hive/hive-site.xml中配置了以下参数

    <property>
        <name>hive.metastore.uris</name>
        <value>thrift://hive1:9083,thrift://hive2:9083</value>
        <description>远程metastore的 Thrift URI,以供metastore客户端连接metastore服务端</description>
    </property>

确认已经修改/usr/local/oushu/conf/common/core-site.xml文件中的代理用户,修改需要并同步至所有HDFS节点并重启NameNode,DataNode 登录hdfs1

ssh hdfs1
su root

修改/usr/local/oushu/conf/common的core-site.xml文件,设置Hive的代理用户

<configuration>
<property>
  <name>hadoop.proxyuser.hdfs.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hdfs.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hive.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hive.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.HTTP.groups</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.HTTP.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hive.users</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hdfs.users</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.root.users</name>
  <value>*</value>
</property>
</configuration>

同步配置

lava scp -r -f ${HOME}/hdfshost /usr/local/oushu/conf/common/*  =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/yarnhost /usr/local/oushu/conf/common/core-site.xml  =:/usr/local/oushu/conf/common/
lava scp -r -f ${HOME}/hivehost /usr/local/oushu/conf/common/*  =:/usr/local/oushu/conf/hive/

同步Hive的Ranger配置,并执行初始化配置脚本

lava scp -r -f ${HOME}/hivehost /usr/local/oushu/ranger-hive-plugin_2.3.0/install.properties  =:/usr/local/oushu/ranger-hive-plugin_2.3.0/
lava ssh -f ${HOME}/hivehost -e '/usr/local/oushu/ranger-hive-plugin_2.3.0/enable-hive-plugin.sh'

执行完初始化脚本后,看到如下信息说明成功,并按照要求重启服务。

Ranger Plugin for hive has been enabled. Please restart hive to ensure that changes are effective.

重新启动Hive

# Hive进程只能采用kill -9 pid的方式关闭
su hive
jps
* 3432 RunJar
* 2987 RunJar
kill -9 3432
kill -9 2987
exit

# 上述命令需要登录到hive2机器再执行一遍
ssh hive2
su - hive
jps
* 3433 RunJar
* 2988 RunJar
kill -9 3433
kill -9 2988
exit

# 回到hive1启动Hive
ssh hive1
su hive
source /usr/local/oushu/conf/hive/hive-env.sh
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'

在rangerUI 上配置用户权限策略#

创建Hive Service服务#
  • 登陆rangerUI http://ranger1:6080,点击➕号添加Hive Service,注意选择的标签是”HADOOP SQL” image

  • 填写服务名,注意需要和install.properties文件里的REPOSITORY_NAME名称保持一致 image

  • 用户名、密码自定义,写入Hive的链接方式,若kerberos开启了kerberos认证,则填写相应的keytab文件,否则使用默认配置 image

  • 运行测试查看是否配置正确,正确后点击添加保存。 image

创建访问测量#
  • 找到刚刚创建的服务,点击名称 image

  • 点击’Add New Policy’按钮 image

  • 设置访问策略,使得hive用户在’t1’下的只有读权限,同时,要确保 recursive 滑块处于开启状态 image image image image

  • 查看刚刚设置 image

Ranger + Kerberos 注意项#

当开启Kerberos配置时,需要对Ranger服务也开启Kerberos,同时在配置Hive repo时,加入参数如下:

image

参数值为配置的Kerberos实体用户名。

检查效果#

登陆hive1机器,使用hive用户访问

sudo su hive
source /usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/hive/bin/beeline

!connect jdbc:hive2://oushu162509m1-4424-4424-1:10000

出下如下信息,证明生效(策略配置完可能需要一分钟生效,可以过会再试)

> use test;
> selcet * from t1;

OK
+---------+
|test.id  |     
+---------+
| 1       |
+---------+
1 row(s) selected(0.18 seconds)

> inset into t1 values(1);

Permission denied: user [hive] does not have [write] privilega on [t1] 

Hive on Tez#

前提#

完成前文Hive部署,可以不开启Kerberos认证。

安装#

Tez由两个包组成,分别为:tez-minimal.tar和tez.tar,通过tar下载Tez到本地。

下载Tez安装包:

sudo su root
lava ssh -f ${HOME}/hivehost -e 'mkdir -p /usr/local/oushu/tez'
lava ssh -f ${HOME}/hivehost -e 'wget $获取两个tarball的 url -O /usr/local/oushu/tez/tez-0.10.1-minimal.tar.gz'
lava ssh -f ${HOME}/hivehost -e 'wget $获取两个tarball的 url -O /usr/local/oushu/tez/tez-0.10.1.tar.gz'

本地解压tez-0.10.1.tar.gz

lava ssh -f ${HOME}/hivehost -e 'tar -zxvf /usr/local/oushu/tez/tez-0.10.1.tar.gz -C  /usr/local/oushu/tez'
lava ssh -f ${HOME}/hivehost -e 'chown -R hive:hadoop /usr/local/oushu/tez'

配置#

在/usr/local/oushu/conf/hive/下创建的tez配置文件tez-site.xml并修改:

<configuration>
	<property>
		<name>tez.lib.uris</name>
		<value>/apps/tez/tez-0.10.1-minimal.tar.gz</value>
		<!-- 这里指向hdfs上的tez.tar.gz包 -->
	</property>
	<property>
		<name>tez.container.max.java.heap.fraction</name>
		<!-- Tez的配置参数 -->
		<value>0.2</value>
	</property>
    <property>
    <name>tez.use.cluster.hadoop-libs</name>
    <value>true</value>
  </property>
    <property>
        <name>tez.am.am-rm.heartbeat.interval-ms.max</name>
        <value>250</value>
    </property>
    <property>
        <name>tez.am.container.idle.release-timeout-max.millis</name>
        <value>20000</value>
    </property>
    <property>
        <name>tez.am.container.idle.release-timeout-min.millis</name>
        <value>10000</value>
    </property>
    <property>
        <name>tez.am.container.reuse.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>tez.am.container.reuse.locality.delay-allocation-millis</name>
        <value>250</value>
    </property>
    <property>
        <name>tez.am.container.reuse.non-local-fallback.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>tez.am.container.reuse.rack-fallback.enabled</name>
        <value>true</value>
    </property>
    <property>
        <name>tez.am.java.opts</name>
        <value>-server -Xmx1024m -Djava.net.preferIPv4Stack=true</value>
    </property>
    <property>
        <name>tez.am.launch.cluster-default.cmd-opts</name>
        <value>-server -Djava.net.preferIPv4Stack=true</value>
    </property>
    <property>
        <name>tez.am.launch.cmd-opts</name>
        <value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB</value>
    </property>
    <property>
        <name>tez.am.launch.env</name>
        <value>LD_LIBRARY_PATH=/usr/local/oushu/hdfs/lib/native</value>
    </property>
    <property>
        <name>tez.am.log.level</name>
        <value>INFO</value>
    </property>
    <property>
        <name>tez.am.max.app.attempts</name>
        <value>2</value>
    </property>
    <property>
        <name>tez.am.maxtaskfailures.per.node</name>
        <value>10</value>
    </property>
    <property>
        <name>tez.am.resource.memory.mb</name>
        <value>2048</value>
    </property>
    <property>
    <name>tez.am.resource.cpu.vcores</name>
        <value>2</value>
    </property>
    <property>
        <name>tez.am.view-acls</name>
        <value></value>
    </property>
    <property>
        <name>tez.counters.max</name>
        <value>10000</value>
    </property>
    <property>
        <name>tez.counters.max.groups</name>
        <value>3000</value>
    </property>
    <property>
        <name>tez.generate.debug.artifacts</name>
        <value>false</value>
    </property>
    <property>
        <name>tez.grouping.max-size</name>
        <value>1073741824</value>
    </property>
    <property>
        <name>tez.grouping.min-size</name>
        <value>16777216</value>
    </property>
    <property>
        <name>tez.grouping.split-waves</name>
        <value>1.7</value>
    </property>
    <property>
        <name>tez.queue.name</name>
        <value>default</value>
    </property>
    <property>
        <name>tez.runtime.compress</name>
        <value>true</value>
    </property>
    <property>
        <name>tez.runtime.compress.codec</name>
        <value>org.apache.hadoop.io.compress.SnappyCodec</value>
    </property>
    <property>
        <name>tez.runtime.convert.user-payload.to.history-text</name>
        <value>false</value>
    </property>
    <property>
        <name>tez.runtime.io.sort.mb</name>
        <value>512</value>
    </property>
    <property>
        <name>tez.runtime.optimize.local.fetch</name>
        <value>true</value>
    </property>
    <property>
        <name>tez.runtime.pipelined.sorter.sort.threads</name>
        <value>1</value>
    </property>
    <property>
        <name>tez.runtime.shuffle.memory.limit.percent</name>
        <value>0.25</value>
    </property>
    <property>
        <name>tez.runtime.sorter.class</name>
        <value>PIPELINED</value>
    </property>
    <property>
        <name>tez.runtime.unordered.output.buffer.size-mb</name>
        <value>76</value>
    </property>
    <property>
        <name>tez.session.am.dag.submit.timeout.secs</name>
        <value>600</value>
    </property>
    <property>
        <name>tez.session.client.timeout.secs</name>
        <value>-1</value>
    </property>
    <property>
        <name>tez.shuffle-vertex-manager.max-src-fraction</name>
        <value>0.4</value>
    </property>
    <property>
        <name>tez.shuffle-vertex-manager.min-src-fraction</name>
        <value>0.2</value>
    </property>
    <property>
        <name>tez.staging-dir</name>
        <value>/tmp/${user.name}/staging</value>
    </property>
    <property>
        <name>tez.task.am.heartbeat.counter.interval-ms.max</name>
        <value>4000</value>
    </property>
    <property>
        <name>tez.task.generate.counters.per.io</name>
        <value>true</value>
    </property>
    <property>
        <name>tez.task.get-task.sleep.interval-ms.max</name>
        <value>200</value>
    </property>
    <property>
        <name>tez.task.launch.cluster-default.cmd-opts</name>
        <value>-server -Djava.net.preferIPv4Stack=true</value>
    </property>
    <property>
        <name>tez.task.launch.cmd-opts</name>
        <value>-XX:+PrintGCDetails -verbose:gc -XX:+PrintGCTimeStamps -XX:+UseNUMA -XX:+UseG1GC -XX:+ResizeTLAB</value>
    </property>
    <property>
        <name>tez.task.launch.env</name>
        <value>LD_LIBRARY_PATH=/usr/local/oushu/hdfs/lib/native</value>
    </property>
    <property>
        <name>tez.task.max-events-per-heartbeat</name>
        <value>500</value>
    </property>
    <property>
        <name>tez.task.resource.memory.mb</name>
        <value>1024</value>
    </property>
    <property>
        <name>tez.use.cluster.hadoop-libs</name>
        <value>true</value>
    </property>
    <property>
        <name>yarn.timeline-service.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>hive.tez.container.size</name>
        <value>2048</value>
    </property>
</configuration>

同步tez到HDFS机器:

lava scp -r -f hdfs1 /usr/local/oushu/tez/tez-0.10.1-minimal.tar.gz =:/usr/local/oushu/hdfs/

# 登录hdfs1机器并上传到HDFS
ssh hdfs1
su hdfs
hdfs dfs -mkdir -p /apps/tez
hdfs dfs -copyFromLocal /usr/local/oushu/hdfs/tez-0.10.1-minimal.tar.gz /apps/tez

# 退出hdfs用户并回到hive1
exit
exit

修改 /usr/local/oushu/conf/hive/hive-site.xml使Hive用Tez

<property>
    <name>hive.execution.engine</name>
    <value>tez</value>
</property>

修改使用YARN引擎

修改/usr/local/oushu/conf/common下的mapred-site.xml

<property>
   <name>mapreduce.framework.name</name>
   <value>yarn-tez</value>
</property>

修改环境变量

export TEZ_CONF_DIR=/usr/local/oushu/conf/hive/tez-site.xml
export TEZ_JARS=/usr/local/oushu/tez/
export HADOOP_CLASSPATH=${HADOOP_CLASSPATH}:${TEZ_CONF_DIR}:${TEZ_JARS}/*:${TEZ_JARS}/lib/*

# 将上方的环境变量信息追加到下方文件
# 如果Hive和HDFS为分离部署,需要将tez-site.xml复制到HDFS配置目录下
/usr/local/oushu/conf/hive/hive-env.sh
/usr/local/oushu/conf/common/hadoop-env.sh

启动#

# 重新启动Hive
# Hive进程只能采用kill -9 pid的方式关闭
su hive
jps
* 3432 RunJar
* 2987 RunJar
kill -9 3432
kill -9 2987
exit

# 上述命令需要登录到hive2机器再执行一遍
ssh hive2
su - hive
jps
* 3433 RunJar
* 2988 RunJar
kill -9 3433
kill -9 2988
exit

# 回到hive1启动Hive
ssh hive1
su hive
source /usr/local/oushu/conf/hive/hive-env.sh
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service metastore >/dev/null 2>&1 &'
lava ssh -f ${HOME}/hivehost -e 'nohup hive --service hiveserver2 >/dev/null 2>&1 &'

检查状态#

# 进入hive客户端 
hive
# 测试是否可用
hive:>create database td_test;
OK 
Time taken:0.201 seconds

hive:>use td_test;
OK 

hive:>create table test(id int);
OK 
Time taken:0.234 seconds

hive:>insert into test values(1),(2);
OK 
Time taken:14.73 seconds, Fetch:1 row(s)

hive:>select * from test;

Query ID = hive_20221110150743_4155afab-4bfa-4e8a-acb0-90c8c50ecfb5
Total jobs = 1
Launching Job 1 out of 1
 
Status: Running (Executing on YARN cluster with App id application_1478229439699_0007)
 
--------------------------------------------------------------------------------
        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------
Map 1 ..........   SUCCEEDED      2          2        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------
VERTICES: 02/02  [==========================>>] 100%  ELAPSED TIME: 10.19 s     
--------------------------------------------------------------------------------
OK
1       oushu
2       hive
Time taken: 11.48 seconds, Fetched: 2 row(s)
# 上边表格出现表明使用率Tez引擎

安装Hive Client安装#

如果需要在并没有部署Hive的机器使用Hive命令,需要安装Hive Client端和HDFS Client端

Hive Client地址假定为hive3,hive4,hive5

准备#

在hive1机器创建hiveclienthost

su root
touch ${HOME}/hiveclienthost

添加下面主机名到hiveclienthost:

hive3
hive4
hive5

交换公钥,以便ssh免密码登陆和分发配置文件

# 和集群内其他机器交换公钥
lava ssh-exkeys -f ${HOME}/hiveclienthost -p ********
# 将repo文件分发给集群内其他机器
lava scp -f ${HOME}/hiveclienthost /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d

安装#

lava ssh -f ${HOME}/hiveclienthost -e 'yum install -y hive'
lava ssh -f ${HOME}/hiveclienthost -e 'yum install -y hdfs mapreduce yarn'

lava ssh -f ${HOME}/hiveclienthost -e 'chown -R  hdfs:hadoop /usr/local/oushu/conf/common/'
lava scp -r -f ${HOME}/hiveclienthost /usr/local/oushu/conf/common/*  =:/usr/local/oushu/conf/common/
lava ssh -f ${HOME}/hiveclienthost -e 'chown -R  hive:hadoop /usr/local/oushu/conf/hive/'
lava scp -r -f ${HOME}/hiveclienthost /usr/local/oushu/conf/hive/*  =:/usr/local/oushu/conf/hive/
lava ssh -f ${HOME}/hiveclienthost -e 'sudo mkdir -p /data1/hdfs/hive/'
lava ssh -f ${HOME}/hiveclienthost -e 'chown -R  hive:hadoop /data1/hdfs/hive/'

检查#

ssh hive3
su hive

# 进入hive客户端 
hive
# 测试是否可用
hive:>create database td_test;
OK 
Time taken:0.201 seconds

# 有返回即证明客户端生效