Computer - Capacity Planning (Sizing) in Spark to run an app
ie how to calculate:
via Ambari or any admin, we can get the cluster composition
For instance:
Total YARN memory = nodes * YARN memory per node
Total YARN memory = 8 nodes * 25GB = 200GB
For instance: 6GB of executor-memory are needed to load the data
set executor-memory = 6GB
Setting cores per executor to larger than 4 may cause garbage collection problems.
set executor-cores = 4
The num-executors parameter is determined by taking the minimum of the memory constraint and the CPU constraint divided by the # of apps running on Spark.
Memory constraint = (total YARN memory / executor memory) / # of apps
Memory constraint = (200GB / 6GB) / 2
Memory constraint = 16 (rounded)
YARN cores = nodes in cluster * # of cores per node * 2
YARN cores = 8 nodes * 8 cores per D14 * 2 = 128
CPU constraint = (total YARN cores / # of cores per executor) / # of apps
CPU constraint = (128 / 4) / 2
CPU constraint = 16
num-executors = Min (memory constraint, CPU constraint)
num-executors = Min (16, 16)
num-executors = 16