Spark - User

Card Puncher Data Processing

User identity

user identity

Simple

Do we need to add the user to all the nodes (head and workernodes) of the cluster ?

  • environment variable
export HADOOP_USER_NAME=zorro
spark-submit
unset HADOOP_USER_NAME
  • sudo
sudo su - user
spark-submit
spark-submit --conf spark.yarn.appMasterEnv.HADOOP_USER_NAME=<user_name>

Kerberized

export KRB5CCNAME=FILE:/tmp/krb5cc_$(id -u)_temp_$$
kinit -kt ~/.protectedDir/zorro.keytab [email protected]
spark-submit ...........
kdestroy

Proxy

With hadoop proxy

--proxy-user NAME           User to impersonate when submitting the application.

Apache Livy for example uses this approach to submit Spark jobs on behalf of other end users.







Share this page:
Follow us:
Task Runner