User identity
Simple
Do we need to add the user to all the nodes (head and workernodes) of the cluster ?
- environment variable
export HADOOP_USER_NAME=zorro
spark-submit
unset HADOOP_USER_NAME
- sudo
sudo su - user
spark-submit
- conf in cluster mode
spark-submit --conf spark.yarn.appMasterEnv.HADOOP_USER_NAME=<user_name>
Kerberized
export KRB5CCNAME=FILE:/tmp/krb5cc_$(id -u)_temp_$$
kinit -kt ~/.protectedDir/zorro.keytab [email protected]
spark-submit ...........
kdestroy
Proxy
With hadoop proxy
--proxy-user NAME User to impersonate when submitting the application.
Apache Livy for example uses this approach to submit Spark jobs on behalf of other end users.