我正在尝试创建 Spark Shell 程序,但在运行时出现错误。
下面是我正在执行的代码。
from pyspark.sql import *
from pyspark import SparkConf
from lib.logger import Log4j
# conf = SparkConf()
# conf.set("spark.executor.extraJavaOptions", "-Dlog4j.configuration=file:log4j.properties -Dspark.yarn.app.container.log.dir=app-logs -Dlogfile.name=hello-spark")
if __name__ == "__main__":
spark = SparkSession.builder \
.appName("Hello Spark") \
.master("local[3") \
.getOrCreate()
logger = Log4j(spark)
logger.info("Starting HelloSpark")
# your processing code
logger.info("Finished HelloSpark")
# spark.stop()
Python 版本:3.12.4
PS C:\Spark\spark-3.5.1-bin-hadoop3> python --version
Python 3.12.4
Java 版本:11.0.23
PS C:\Spark\spark-3.5.1-bin-hadoop3> Java --version
java 11.0.23 2024-04-16 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.23+7-LTS-222)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.23+7-LTS-222, mixed mode)
PS C:\Spark\spark-3.5.1-bin-hadoop3>
错误:
Windows PowerShell
Copyright (C) Microsoft Corporation. All rights reserved.
Install the latest PowerShell for new features and improvements! https://aka.ms/PSWindows
PS C:\Spark\spark-3.5.1-bin-hadoop3> spark-submit --properties-file C:\Spark\spark-3.5.1-bin-hadoop3\conf\spark-defaults.conf 'C:\Users\JainRonit\OneDrive - STCO\Desktop\Personal\Study\Coding\Pyspark\02-Spark-First-Project\HelloSpark.py'
24/07/26 11:13:49 INFO SparkContext: Running Spark version 3.5.1
24/07/26 11:13:49 INFO SparkContext: OS info Windows 11, 10.0, amd64
24/07/26 11:13:49 INFO SparkContext: Java version 11.0.23
24/07/26 11:13:50 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
24/07/26 11:13:50 ERROR SparkContext: Error initializing SparkContext.
java.lang.Exception: spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dlog4j.configuration=file:log4j.properties -Dspark.yarn.app.container.log.dir=app-logs -Dlogfile.name=HelloSpark'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit.
at org.apache.spark.SparkConf.$anonfun$validateSettings$4(SparkConf.scala:525)
at org.apache.spark.SparkConf.$anonfun$validateSettings$4$adapted(SparkConf.scala:521)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:521)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:410)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:834)
24/07/26 11:13:50 INFO SparkContext: SparkContext is stopping with exitCode 0.
24/07/26 11:13:50 INFO SparkContext: Successfully stopped SparkContext
Traceback (most recent call last):
File "C:\Users\JainRonit\OneDrive - STCO\Desktop\Personal\Study\Coding\Pyspark\02-Spark-First-Project\HelloSpark.py", line 13, in <module>
.getOrCreate()
^^^^^^^^^^^^^
File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\sql\session.py", line 497, in getOrCreate
File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\context.py", line 515, in getOrCreate
File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\context.py", line 203, in __init__
File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\context.py", line 296, in _do_init
File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\pyspark.zip\pyspark\context.py", line 421, in _initialize_context
File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\java_gateway.py", line 1587, in __call__
File "C:\Spark\spark-3.5.1-bin-hadoop3\python\lib\py4j-0.10.9.7-src.zip\py4j\protocol.py", line 326, in get_return_value
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.lang.Exception: spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dlog4j.configuration=file:log4j.properties -Dspark.yarn.app.container.log.dir=app-logs -Dlogfile.name=HelloSpark'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit.
at org.apache.spark.SparkConf.$anonfun$validateSettings$4(SparkConf.scala:525)
at org.apache.spark.SparkConf.$anonfun$validateSettings$4$adapted(SparkConf.scala:521)
at scala.Option.foreach(Option.scala:407)
at org.apache.spark.SparkConf.validateSettings(SparkConf.scala:521)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:410)
at org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:58)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at java.base/jdk.internal.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.base/java.lang.reflect.Constructor.newInstance(Constructor.java:490)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:247)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:238)
at py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:80)
at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:69)
at py4j.ClientServerConnection.waitForCommands(ClientServerConnection.java:182)
at py4j.ClientServerConnection.run(ClientServerConnection.java:106)
at java.base/java.lang.Thread.run(Thread.java:834)
24/07/26 11:13:50 INFO ShutdownHookManager: Shutdown hook called
24/07/26 11:13:50 INFO ShutdownHookManager: Deleting directory C:\Users\JainRonit\AppData\Local\Temp\spark-0326d309-090a-4a5f-af13-d7fe347ab38d
Spark-default.conf 文件
spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties -Dspark.yarn.app.container.log.dir=app-logs -Dlogfile.name=HelloSpark
我尝试通过在spark-defaults.cong 中设置默认内容来运行我的代码,但我在执行过程中遇到错误。
错误信息清楚地表明了问题所在:
java.lang.Exception: spark.executor.extraJavaOptions is not allowed to set Spark options (was '-Dlog4j.configuration=file:log4j.properties -Dspark.yarn.app.container.log.dir=app-logs -Dlogfile.name=HelloSpark'). Set them directly on a SparkConf or in a properties file when using ./bin/spark-submit.
虽然你正在使用
spark-defaults.conf
文件,但你试图在
spark.executor.extraJavaOptions
中设置 Spark 选项,而这些选项实际上不是 JVM 选项。
要解决此问题,你需要直接在你的
SparkConf
对象或使用
--conf
选项通过
spark-submit
设置这些属性。
以下是调整后的代码和执行步骤:
1. 修改
HelloSpark.py
代码:
from pyspark.sql import SparkSession
from pyspark import SparkConf
from lib.logger import Log4j
if __name__ == "__main__":
conf = SparkConf() \
.setAppName("Hello Spark") \
.setMaster("local[3]") \
.set("log4j.configuration", "file:log4j.properties") \
.set("spark.yarn.app.container.log.dir", "app-logs") \
.set("logfile.name", "HelloSpark")
spark = SparkSession.builder \
.config(conf=conf) \
.getOrCreate()
logger = Log4j(spark)
logger.info("Starting HelloSpark")
# your processing code
logger.info("Finished HelloSpark")
# spark.stop()
在这个修改后的代码中:
-
我们创建了一个
SparkConf
对象,并直接设置 Spark 应用程序的名称、master URL 以及你想要设置的与日志相关的属性。 -
然后,我们将
SparkConf
对象传递给SparkSession.builder
,以确保这些配置被应用。
2. 使用
spark-submit
运行代码:
spark-submit 'C:\Users\JainRonit\OneDrive - STCO\Desktop\Personal\Study\Coding\Pyspark\02-Spark-First-Project\HelloSpark.py'
注意,我们不再需要使用
--properties-file
选项,因为我们现在直接在代码中设置配置。
通过这些修改,你应该能够成功运行你的 Spark 应用程序并正确配置日志记录。
标签:python,apache-spark,pyspark,apache-spark-sql From: 78796536