命令行安装#
如果您希望使用命令行的方式部署Spark,请按照本章节步骤安装。
本节默认yum源已经配置在IP为192.168.1.10的机器。
Spark Standalone模式安装#
前提#
Spark Standalone模式需要依赖Zookeeper,HDFS集群。Zookeeper用于支持Spark HA,HDFS集群用于存储历史数据。
Zookeeper安装部署请参考:Zookeeper 安装。
HDFS安装部署请参考:HDFS 安装。
Zookeeper服务地址假定为oushu1:2181,oushu2:2181,oushu3:2181
HDFS nameservice地址假定为hdfs://oushu
首先登录到oushu1,然后切换到root用户
ssh oushu1
su - root
创建一个sparkhosts
文件,包含Spark集群中所有的机器
cat > ${HOME}/sparkhosts << EOF
oushu1
oushu2
oushu3
oushu4
EOF
创建一个sparkmasters
文件,包含Spark集群中所有的master机器
cat > ${HOME}/sparkmasters << EOF
oushu1
oushu2
EOF
创建一个sparkworkers
文件,包含Spark集群中所有的worker机器
cat > ${HOME}/sparkworkers << EOF
oushu1
oushu2
oushu3
EOF
在oushu1节点配置yum源,安装lava命令行管理工具
# 从yum源所在机器获取repo文件
scp oushu@192.168.1.10:/etc/yum.repos.d/oushu.repo /etc/yum.repos.d/oushu.repo
# 追加yum源所在机器信息到/etc/hosts文件
# 安装lava命令行管理工具
yum clean all
yum makecache
yum install lava
oushu1节点和集群内其他节点交换公钥,以便ssh免密码登陆和分发配置文件。
lava ssh-exkeys -f ${HOME}/sparkhosts -p ********
分发repo文件到其他机器
lava scp -f ${HOME}/sparkhosts /etc/yum.repos.d/oushu.repo =:/etc/yum.repos.d
安装#
在使用yum install安装Spark
lava ssh -f ${HOME}/sparkhosts -e "sudo yum install -y spark"
配置#
Spark配置参数保存在spark-defaults.conf
,spark-env.sh
。模板配置文件可在/usr/local/oushu/spark/conf.empty
中找到。
文件名 |
作用描述 |
---|---|
spark-defaults.conf |
程序内部的默认配置信息,配置信息将存到SparkConfig中,优先级最低 |
spark-env.sh |
程序启动环境变量 |
配置文件在Spark的配置路径下才能生效:/usr/local/oushu/conf/spark
。
准备数据目录#
Spark Worker需要将Driver的执行日志保存到文件系统中,需要给Spark配置可用的文件路径
lava ssh -f ${HOME}/sparkhosts -e "mkdir -p /data1/spark/sparkwork"
lava ssh -f ${HOME}/sparkhosts -e "chown -R spark:spark /data1/spark"
Spark需要将Application运行历史保存到HDFS集群上,需要给Spark配置可用的HDFS文件路径。
登录上HDFS集群,执行以下命令创建HDFS文件路径
sudo -u hdfs hdfs dfs -mkdir -p /spark/spark-history
sudo -u hdfs hdfs dfs -chown -R spark:spark /spark
配置依赖集群#
登录到oushu1,然后切换到root用户
ssh oushu1
su - root
添加HDFS配置文件/usr/local/oushu/conf/spark/core-site.xml
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://oushu</value>
</property>
</configuration>
添加HDFS配置文件/usr/local/oushu/conf/spark/hdfs-site.xml
hdfs-site.xml
模板
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<property>
<name>rpc.client.timeout</name>
<value>3600000</value>
</property>
<property>
<name>rpc.client.connect.tcpnodelay</name>
<value>true</value>
</property>
<property>
<name>rpc.client.max.idle</name>
<value>10000</value>
</property>
<property>
<name>rpc.client.ping.interval</name>
<value>10000</value>
</property>
<property>
<name>rpc.client.connect.timeout</name>
<value>600000</value>
</property>
<property>
<name>rpc.client.connect.retry</name>
<value>10</value>
</property>
<property>
<name>rpc.client.read.timeout</name>
<value>3600000</value>
</property>
<property>
<name>rpc.client.write.timeout</name>
<value>3600000</value>
</property>
<property>
<name>rpc.client.socket.linger.timeout</name>
<value>-1</value>
</property>
<property>
<name>dfs.client.read.shortcircuit</name>
<value>true</value>
</property>
<property>
<name>dfs.default.replica</name>
<value>3</value>
</property>
<property>
<name>dfs.prefetchsize</name>
<value>10</value>
</property>
<property>
<name>dfs.client.failover.max.attempts</name>
<value>15</value>
</property>
<property>
<name>dfs.default.blocksize</name>
<value>134217728</value>
</property>
<property>
<name>dfs.client.log.severity</name>
<value>INFO</value>
</property>
<property>
<name>input.connect.timeout</name>
<value>600000</value>
</property>
<property>
<name>input.read.timeout</name>
<value>3600000</value>
</property>
<property>
<name>input.write.timeout</name>
<value>3600000</value>
</property>
<property>
<name>input.localread.default.buffersize</name>
<value>2097152</value>
</property>
<property>
<name>input.localread.blockinfo.cachesize</name>
<value>1000</value>
</property>
<property>
<name>input.read.getblockinfo.retry</name>
<value>3</value>
</property>
<property>
<name>output.replace-datanode-on-failure</name>
<value>false</value>
</property>
<property>
<name>output.default.chunksize</name>
<value>512</value>
</property>
<property>
<name>output.default.packetsize</name>
<value>65536</value>
</property>
<property>
<name>output.default.write.retry</name>
<value>10</value>
</property>
<property>
<name>output.connect.timeout</name>
<value>600000</value>
</property>
<property>
<name>output.read.timeout</name>
<value>3600000</value>
</property>
<property>
<name>output.write.timeout</name>
<value>3600000</value>
</property>
<property>
<name>output.packetpool.size</name>
<value>1024</value>
</property>
<property>
<name>output.close.timeout</name>
<value>900000</value>
</property>
<property>
<name>dfs.domain.socket.path</name>
<value>/var/lib/hadoop-hdfs/dn_socket</value>
</property>
<property>
<name>dfs.client.use.legacy.blockreader.local</name>
<value>false</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.oushu</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<property>
<name>dfs.ha.namenodes.oushu</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.http-address.oushu.nn1</name>
<value>oushu1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.oushu.nn2</name>
<value>oushu2:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.oushu.nn1</name>
<value>oushu1:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.oushu.nn2</name>
<value>oushu2:9000</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>oushu</value>
</property>
</configuration>
以下配置需要修改为HDFS实际部署配置
<property>
<name>dfs.ha.namenodes.oushu</name>
<value>nn1,nn2</value>
</property>
<property>
<name>dfs.namenode.http-address.oushu.nn1</name>
<value>oushu1:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.oushu.nn2</name>
<value>oushu2:50070</value>
</property>
<property>
<name>dfs.namenode.rpc-address.oushu.nn1</name>
<value>oushu1:9000</value>
</property>
<property>
<name>dfs.namenode.rpc-address.oushu.nn2</name>
<value>oushu2:9000</value>
</property>
<property>
<name>dfs.nameservices</name>
<value>oushu</value>
</property>
备注
Spark Standalone模式部署不支持Kerberos HDFS
配置Spark Master/Worker#
登录到oushu1,然后切换到root用户
ssh oushu1
su - root
创建配置文件spark-defaults.conf
cat > ${HOME}/spark-defaults.conf << EOF
spark.master.rest.enabled=true
spark.master.rest.port=2881
EOF
创建配置文件spark-env.sh
,需要修改Zookeeper的地址oushu1:2181,oushu2:2181,oushu3:2181到实际部署的地址
如果采用的hostname:port形式,需要将hostname对应的ip配置到/etc/hosts文件中
cat > ${HOME}/spark-env.sh << EOF
export SPARK_MASTER_PORT="2882"
export SPARK_MASTER_WEBUI_PORT="2883"
export SPARK_WORKER_WEBUI_PORT="2885"
export SPARK_WORKER_DIR="/data1/spark/sparkwork"
export SPARK_LOG_DIR="/usr/local/oushu/log/spark"
export JAVA_HOME="/usr/lib/jvm/java"
export SPARK_MASTER_OPTS="-Dfile.encoding=UTF-8"
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=oushu1:2181,oushu2:2181,oushu3:2181 -Dspark.deploy.zookeeper.dir=/oushu270120"
EOF
将配置文件分发到其他机器
lava scp -f ${HOME}/sparkworkers ${HOME}/spark-env.sh =:/tmp
lava scp -f ${HOME}/sparkworkers ${HOME}/spark-defaults.conf =:/tmp
lava ssh -f ${HOME}/sparkworkers -e "mv -f /tmp/spark-env.sh /usr/local/oushu/conf/spark"
lava ssh -f ${HOME}/sparkworkers -e "chown spark:spark /usr/local/oushu/conf/spark/spark-env.sh"
lava ssh -f ${HOME}/sparkworkers -e "mv -f /tmp/spark-defaults.conf /usr/local/oushu/conf/spark"
lava ssh -f ${HOME}/sparkworkers -e "chown spark:spark /usr/local/oushu/conf/spark/spark-defaults.conf"
配置Spark History Server#
History Server只部署到oushu1,所以只用在oushu1机器中追加配置到spark-defaults.conf
echo 'spark.history.ui.port=2884
spark.history.fs.logDirectory=hdfs://oushu/spark/spark-history
spark.eventLog.dir=hdfs://oushu/spark/spark-history
spark.eventLog.enabled=true' >> /usr/local/oushu/conf/spark/spark-defaults.conf
配置Spark Client#
登录到oushu4,然后切换到root用户
ssh oushu4
su - root
准备目录
mkdir -p /data1/spark/spark-warehouse
chown -R spark:spark /data1/spark
chmod 733 /data1/spark/spark-warehouse
配置spark-defaults.conf
文件
echo 'spark.sql.warehouse.dir=/data1/spark/spark-warehouse' >> /usr/local/oushu/conf/spark/spark-defaults.conf
core-site.conf
文件添加以下配置
<property>
<name>hive.exec.scratchdir</name>
<value>file:///data1/spark/spark-warehouse</value>
</property>
启动#
启动Spark Master#
登录oushu1节点
ssh oushu1
su - root
执行以下操作以启动Spark Master
lava ssh -f ${HOME}/sparkmasters -e "sudo -u spark /usr/local/oushu/spark/sbin/start-master.sh"
启动Spark Worker#
执行以下操作以启动Spark Worker
lava ssh -f ${HOME}/sparkworkers -e "sudo -u spark /usr/local/oushu/spark/sbin/start-slave.sh 'oushu1:2882,oushu2:2882'"
启动History Server#
sudo -u spark /usr/local/oushu/spark/sbin/start-history-server.sh
检查状态#
检查Spark Application运行历史是否正常,浏览器访问http://oushu1:2884
。
登录到oushu4,然后切换到root用户
ssh oushu4
su - root
检查Spark Client是否能正常提交任务
sudo -u spark /usr/local/oushu/spark/bin/spark-submit \
--class org.apache.spark.examples.SparkPi \
--master spark://oushu1:2882,oushu2:2882 \
--executor-memory 1G \
--total-executor-cores 3 \
/usr/local/oushu/spark/examples/jars/spark-examples_2.12-3.1.2.jar \
1000
检查Spark SQL是否能正常启动
sudo -u spark /usr/local/oushu/spark/bin/spark-sql \
--master spark://oushu1:2882,oushu2:2882 \
--executor-memory 1G \
--total-executor-cores 3
在Spark SQL中执行以下SQL语句
show databases;
create table test(a int) using orc location 'hdfs://oushu/spark/test';
insert into test values(1);
select * from test;
常用命令#
停止Spark服务
#停止master
/usr/local/oushu/spark/sbin/stop-master.sh
#停止worker
/usr/local/oushu/spark/sbin/stop-slave.sh
# 停止History server
/usr/local/oushu/spark/sbin/stop-history-server.sh
注册到Skylab(可选)#
在oushu1节点修改lava命令行工具配置中skylab的节点ip
vi /usr/local/oushu/lava/conf/server.json
编写注册request到一个文件,例如~/spark-register.json
{
"data": {
"name": "SparkCluster",
"group_roles": [
{
// 安装master节点
"role": "spark.master",
"cluster_name": "oushu1",
"group_name": "master1",
// 要安装的机器信息,在lavaadmin的元数据表machine中能查到
"machines": [
{
"id": 1,
"name": "hostname1",
"subnet": "lava",
"data_ip": "127.0.0.1",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
},
{
// 安装worker节点
"role": "spark.worker",
"cluster_name": "oushu1",
"group_name": "worker1",
"machines": [
{
"id": 1,
"name": "hostname1",
"subnet": "lava",
"data_ip": "127.0.0.1",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
},
{
// 安装history节点
"role": "spark.history",
"cluster_name": "oushu1",
"group_name": "history1",
"machines": [
{
"id": 1,
"name": "hostname1",
"subnet": "lava",
"data_ip": "127.0.0.1",
"manage_ip": "",
"assist_port": 1622,
"ssh_port": 22
}
]
}
],
"config": {
"spark-defaults.conf": [
{
"key": "spark.master.rest.port",
"value": "2881"
},
{
"key": "spark.master.rest.enabled",
"value": "true"
},
{
"key": "spark.history.ui.port",
"value": "2884"
},
{
"key": "spark.history.fs.logDirectory",
"value": "hdfs://oushu/littleboy/spark/spark-history"
},
{
"key": "spark.eventLog.dir",
"value": "hdfs://oushu/littleboy/spark/spark-history"
},
{
"key": "spark.eventLog.enabled",
"value": "true"
}
],
"spark-env.sh": [
{
"key": "SPARK_LOG_DIR",
"value": "/usr/local/oushu/log/spark"
},
{
"key": "SPARK_MASTER_HOSTS",
"value": "oushu1,oushu2"
},
{
"key": "SPARK_MASTER_WEBUI_PORT",
"value": "2883"
},
{
"key": "SPARK_MASTER_PORT",
"value": "2882"
},
{
"key": "SPARK_WORKER_WEBUI_PORT",
"value": "2885"
},
{
"key": "SPARK_MASTER_OPTS",
"value": "\"-Dfile.encoding=UTF-8\""
},
{
"key": "SPARK_WORKER_DIR",
"value": "/data1/spark/sparkwork"
},
{
"key": "SPARK_DAEMON_JAVA_OPTS",
"value": "-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=oushu1:2181,oushu2:2181,oushu3:2181 -Dspark.deploy.zookeeper.dir=/oushu270120"
}
]
}
}
}
上述配置文件中,需要根据实际情况修改machines数组中的机器信息,在平台基础组件lava所安装的机器执行:
psql lavaadmin -p 4432 -U oushu -c "select m.id,m.name,s.name as subnet,m.private_ip as data_ip,m.public_ip as manage_ip,m.assist_port,m.ssh_port from machine as m,subnet as s where m.subnet_id=s.id;"
获取到所需的机器信息,根据服务角色对应的节点,将机器信息添加到machines数组中。
例如oushu1对应spark master节点,那么oushu1的机器信息需要备添加到spark.master角色对应的machines数组中。
调用lava命令注册集群:
lava login -u oushu -p ********
lava onprem-register service -s Spark -f ~/spark-register.json
如果返回值为:
Add service by self success
则表示注册成功,如果有错误信息,请根据错误信息处理。
同时,从页面登录后,在自动部署模块对应服务中可以查看到新添加的集群,同时列表中会实时监控Spark进程在机器上的状态。
Spark Yarn模式安装#
前提#
Spark Yarn模式需要依赖Yarn集群,HDFS集群。
Yarn安装部署请参考:Yarn 安装。
HDFS安装部署请参考:HDFS 安装。HDFS nameservice地址假定为hdfs://oushu
首先登录到oushu4,然后切换到root用户
ssh oushu4
su - root
安装#
Spark rpm安装参照Spark Standalone模式
配置#
准备数据目录#
在HDFS集群上创建历史数据目录
sudo -u hdfs hdfs dfs -mkdir /spark/spark-history
sudo -u hdfs hdfs dfs -chown -R spark:spark /spark
配置依赖集群#
HDFS依赖集群的配置参照Spark Standalone模式
配置Yarn#
登录到yarn1,在Yarn集群上使用yum install安装spark-shuffle
ssh yarn1
su - root
cat > ${HOME}/yarnhost << EOF
yarn1
yarn2
yarn3
EOF
lava ssh -f ${HOME}/yarnhost -e "sudo yum install -y spark-shuffle"
添加以下配置到/usr/local/oushu/conf/yarn/yarn-site.xml
<property>
<name>yarn.nodemanager.aux-services</name>
<value>spark_shuffle,mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.class</name>
<value>org.apache.spark.network.yarn.YarnShuffleService</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.spark_shuffle.classpath</name>
<value>/usr/local/oushu/spark-shuffle-3.1.2/yarn/spark-3.1.2-yarn-shuffle.jar</value>
</property>
将配置文件分发到所有yarn节点,并重启nodemanager
lava scp -f ${HOME}/yarnhost /usr/local/oushu/conf/yarn/yarn-site.xml =:/usr/local/oushu/conf/yarn/yarn-site.xml
lava ssh -f ${HOME}/yarnhost -e 'sudo -u yarn yarn --daemon stop nodemanager'
lava ssh -f ${HOME}/yarnhost -e 'sudo -E -u yarn yarn --daemon start nodemanager'
将配置文件/usr/local/oushu/conf/yarn/yarn-site.xml
复制到oushu4
scp /usr/local/oushu/conf/yarn/yarn-site.xml root@oushu4:/usr/local/oushu/conf/spark/yarn-site.xml
配置Kerberos(可选)#
如果HDFS配置了Kerberos认证,登录到Kerberos服务器执行下面命令进入Kerberos控制台
kadmin.local
进入控制台后执行下列操作,配置principal实体名
addprinc -randkey spark@OUSHU.COM
ktadd -k /etc/security/keytabs/spark.keytab spark@OUSHU.COM
备注
Kerberos中hostname不支持大写,如果hostname带大写字母,请将hostname改为小写字母
将生成的keytab复制到oushu4
scp root@kerberosserver:/etc/security/keytabs/spark.keytab /etc/security/keytabs/spark.keytab
core-site.xml
文件中添加以下配置
<property>
<name>hadoop.security.authentication</name>
<value>kerberos</value>
</property>
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
<property>
<name>hadoop.rpc.protection</name>
<value>authentication</value>
</property>
在hdfs-site.xml
文件中添加以下配置
<property>
<name>dfs.data.transfer.protection</name>
<value>authentication</value>
</property>
<property>
<name>dfs.namenode.kerberos.principal.pattern</name>
<value>*</value>
</property>
修改配置文件#
创建配置文件spark-defaults.conf
cat > /usr/local/oushu/conf/spark/spark-defaults.conf << EOF
spark.history.ui.port=2884
spark.history.fs.logDirectory=hdfs://oushu/spark/spark-history
spark.eventLog.dir=hdfs://oushu/spark/spark-history
spark.eventLog.enabled=true
spark.yarn.stagingDir=hdfs://oushu/spark/staging
EOF
创建配置文件spark-env.sh
cat > /usr/local/oushu/conf/spark/spark-env.sh << EOF
export YARN_CONF_DIR=/usr/local/oushu/conf/spark
export JAVA_HOME="/usr/lib/jvm/java"
export SPARK_MASTER_OPTS="-Dfile.encoding=UTF-8"
EOF
History Server配置Kerberos(可选)
配置文件spark-defaults.conf
追加以下配置
echo 'spark.history.kerberos.enabled=true
spark.history.kerberos.principal=spark@OUSHU.COM
spark.history.kerberos.keytab=/etc/security/keytabs/spark.keytab' >> /usr/local/oushu/conf/spark/spark-defaults.conf
启动#
启动History Server
sudo -u spark /usr/local/oushu/spark/sbin/start-history-server.sh
检查状态#
浏览器访问http://oushu4:2884查看History服务是否正常
提交作业
sudo -u spark /usr/local/oushu/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi \
--master yarn \
--deploy-mode cluster \
--driver-memory 1g \
--executor-memory 2g \
--executor-cores 1 \
--principal spark@OUSHU.COM \
--keytab /etc/security/keytabs/spark.keytab \
/usr/local/oushu/spark/examples/jars/spark-examples*.jar \
10
备注
提交作业访问Kerberos HDFS/Yarn需要指定–principal,–keytab
常用命令#
停止History Server
# 停止History server
sudo -u spark /usr/local/oushu/spark/sbin/stop-history-server.sh
常见问题#
问题:提交作业时报错
User spark not found
原因:yarn集群上不存在
spark
用户解决办法:在yarn集群上创建
spark
用户问题:提交作业时报错
Requested user spark is not whitelisted and has id 993,which is below the minimum allowed 1000
原因:yarn禁止了user id 1000以下的用户提交任务
解决办法1: 修改yarn集群 user id 到1000以上
usermod -u 2001 spark
解决办法2: 修改yarn配置文件
/etc/hadoop/container-executor.cfg
中的配置项min.user.id=0
,然后重启yarn集群