Yarn - Capacity Scheduler
Table of Contents
About
The capacity scheduler is one possible scheduler plugin.
The CapacityScheduler provides:
- a set of limits to ensure that a single application or user or queue cannot consume disproportionate amount of resources
- limits on initialized and pending applications from a single user and queue
Articles Related
Model
Resources are defined by:
- Queue and sub queue (hierarchical queues) where queue in the branch have higher priority than the other.
- Predefined queue called root. All queues in the system are children of the root queue.
- Queues are allocated a fraction of the capacity of the grid
- applications submitted to a queue will have access to the capacity allocated to the queue. Administrators can configure soft limits and optional hard limits on the capacity allocated to each queue. A hard limit cannot be increased once it is set; a soft limit may be increased up to the value of the hard limit.
- Queue ACL to submit, administrator role
- Queue Mapping based on User or Group
- Queue Priority (Higher integer value indicates higher priority for an application only for FIFO ordering policy).
- Absolute vs Percentage Value Resource Configuration
Tree
The queue tree properties:
- The hierarchy is defined through level in the path yarn.scheduler.capacity.<queue-path>. For instance: The a level (yarn.scheduler.capacity.root.a.queues) is a level below root (yarn.scheduler.capacity.root.queues)
- Several queues can be defined by level.
- Children do not inherit properties
Configuration Management
capacity-scheduler.xml
The configuration follows the hadoop configuration principles and are stored in the file capacity-scheduler.xml
Example with 3 levels (root, a and b) and 8 queues (a,b,c,a1,a2,b1,b2,b3)
<property>
<name>yarn.scheduler.capacity.root.queues</name>
<!-- yarn.scheduler.capacity.<queue-path> -->
<value>a,b,c</value>
<description>The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.a.queues</name>
<value>a1,a2</value>
<description>The queues at the this level (root is the root queue).
</description>
</property>
<property>
<name>yarn.scheduler.capacity.root.b.queues</name>
<value>b1,b2,b3</value>
<description>The queues at the this level (root is the root queue).
</description>
</property>
More Configuration
Modification
via Cli
vi $HADOOP_CONF_DIR/capacity-scheduler.xml
$HADOOP_YARN_HOME/bin/yarn rmadmin -refreshQueues
via Rest
Yarn - ResourceManager Rest API https://hadoop.apache.org/docs/current/hadoop-yarn/hadoop-yarn-site/ResourceManagerRest.html#Scheduler_Configuration_Mutation_API