Spark - User

User identity

Do we need to add the user to all the nodes (head and workernodes) of the cluster ?

export HADOOP_USER_NAME=zorro
spark-submit
unset HADOOP_USER_NAME

sudo su - user
spark-submit

spark-submit --conf spark.yarn.appMasterEnv.HADOOP_USER_NAME=<user_name>

export KRB5CCNAME=FILE:/tmp/krb5cc_$(id -u)_temp_$$
kinit -kt ~/.protectedDir/zorro.keytab [email protected]
spark-submit ...........
kdestroy

--proxy-user NAME           User to impersonate when submitting the application.

Apache Livy for example uses this approach to submit Spark jobs on behalf of other end users.