1. 下载VMWare, 安装CentOS9虚拟机
2. 配置用户,创建目录
1. 以管理员身份登录,创建Spark用户给Spark使用
sudo adduser sparkuser
2. 修改新用户密码 (123456)
sudo passwd sparkuser
3. 给新用户Sparkuser Sudo权限
切换到Root: su -
给sparkuser权限: sparkuser ALL=(ALL) NOPASSWD:ALL
退出保存: :qw
4. 以新建的sparkuser用户登录,创建Spark目录
sudo mkdir /opt/spark
5. 修改spark目录owner为sparkuser
sudo chown -R sparkuser:sparkuser /opt/spark
3. 下载spark包,上传到虚拟机,解压到spark目录
sudo tar -xvzf spark-3.5.3-bin-hadoop3.tgz -C /opt/spark --strip-components=1
(The --strip-components=1
option removes the top-level directory from the extracted files, so they go directly into /opt/spark
.)
sudo chown -R sparkuser:sparkuser /opt/spark
4. 设置环境变量
Add Spark to your PATH by editing the .bashrc
or .bash_profile
of the Spark user.
echo "export SPARK_HOME=/opt/spark" >> /home/sparkuser/.bashrc echo "export PATH=\$PATH:\$SPARK_HOME/bin" >> /home/sparkuser/.bashrc source /home/sparkuser/.bashrc
5. JAVA Setup
安装Java
sudo yum install java-11-openjdk-devel
查看版本
java -version
查看路径
readlink -f $(which java)
设置环境变量
echo "export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-11.0.20.1.1-2.el9.x86_64" >> /home/sparkuser/.bashrc echo "export PATH=$JAVA_HOME/bin:$PATH" >> /home/sparkuser/.bashrc source /home/sparkuser/.bashrc
6. 启动Spark
spark-shell
7. 启动spark deltalake
bin/spark-shell --packages io.delta:delta-spark_2.12:3.2.0 \ --conf "spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension" \ --conf "spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog"
8. 测试deltalake
val data = spark.range(0, 5) data.write.format("delta").save("/tmp/delta-table")
标签:sudo,平台,bashrc,delta,sparkuser,home,spark,DeltaLake,搭建 From: https://www.cnblogs.com/xgc521/p/18470100