About
Spark DataSet - Session (SparkSession|SQLContext) in PySpark
The variable in the shell is spark
Articles Related
Command
If SPARK_HOME is set
If SPARK_HOME is set, when getting a SparkSession, the python script calls the script SPARK_HOME\bin\spark-submit who call SPARK_HOME\bin\spark-class2
Example: The below sparksession builder code
spark = SparkSession \
.builder \
.appName("nico app") \
.config("spark.debug.maxToStringFields", "50") \
.getOrCreate()
Result in this command
java ^
-cp "C:\spark-2.2.0-bin-hadoop2.7\bin\..\conf\;C:\spark-2.2.0-bin-hadoop2.7\bin\..\jars\*" ^
-Xmx1g org.apache.spark.deploy.SparkSubmit ^
--conf "spark.debug.maxToStringFields=50" ^
--conf "spark.app.name=nico app" ^
pyspark-shell
If SPARK_HOME is not set
Good question but it seems to call Java directly.
2018-07-02 12:46:11 WARN NativeCodeLoader:62 -
Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Support
spark.app.name is not recognized as an internal or external command
'C:\spark-2.2.0-bin-hadoop2.7\bin\spark-submit2.cmd" --conf "spark.app.name' is not recognized as an internal or external command
Possible Resolution: Verify your spark-submit.cmd script. Suppress the quotes.
- Bad:
cmd /V /E /C "%~dp0spark-submit2.cmd" %*
- Good:
cmd /V /E /C %~dp0spark-submit2.cmd %*