服务和组件 Hadoop 2.7.1、Zookeeper 3.4.8、Scala 2.11.8
解压:
tar -zxvf spark-2.1.1-bin-hadoop2.7.tgz
重命名:
mv spark-2.1.1-bin-hadoop2.7 spark
配置环境变量:
vi /etc/profile
export SPARK_HOME=/usr/local/src/spark
export PATH=$PATH:$SPARK_HOME/bin:$PATH
刷新:
source /etc/profile
切换目录:
cd spark/conf/
重命名:
cp spark-env.sh.template spark-env.sh
配置文档:
vim spark-env.sh
export JAVA_HOME=/usr/local/src/jdk1.8.0_152/
export HADOOP_HOME=/usr/local/src/hadoop-2.7.1/
export SCALA_HOME=/usr/local/src/scala
export SPARK_MASTER_IP=master
export SPARK_MASTER_PORT=7077
export SPARK_DIST_CLASSPATH=$(/usr/local/src/hadoop-2.7.1/bin/hadoop classpath)
export HADOOP_CONF_DIR=/usr/local/src/hadoop-2.7.1/
export SPARK_YARN_USER_ENV="CLASSPATH=/usr/local/src/hadoop-2.7.1/etc/hadoop/"
export YARN_CONF_DIR=/usr/local/src/hadoop-2.7.1/etc/hadoop/
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER - Dspark.deploy.zookeeper.url=master:2181,slave1:2181,slave2:2181 - Dspark.deploy.zookeeper.dir=/spark"
重命名:
cp slaves.template slaves
配置文档:
vim slaves
master
slave1
slave2
分发:
scp -r /usr/local/src/spark/ slave1:/usr/local/src/
scp -r /usr/local/src/spark/ slave2:/usr/local/src/
配置slave1、slave2环境变量:
vim /etc/profile
export SPARK_HOME=/usr/local/src/spark
export PATH=$PATH:$SPARK_HOME/bin:$PATH
主节点启动spark,启动前确保Hadoop HA已启动:
cd /usr/local/src/spark/sbin
./start-all.sh
在slave1节点启动spark:
cd /usr/local/src/spark/sbin
./start-master.sh
进入网页8080:
Spark Master at spark://master:7077
在 spark-shell 上运行一个 WordCount 案例:
1.通过加载文件新建一个 RDD:
cd /usr/local/src/spark
hadoop fs -put README.md /
2.在 Yarn 集群管理器上运行 spark-shell:
cd /usr/local/src/spark/bin/
./spark-shell --master yarn --deploy-mode client
val textFile=sc.textFile("/README.md")
3.对 RDD 进行 actions 和 transformations 操作:
scala> val wordcount=textFile.flatMap(line=>line.split(",")).map(word=>(word,1)).reduceByKey(+)
scala> wordcount.collect()
scala> wordcount.collect().foreach(println)
scala>:q