Yarn - Capacity Scheduler

About

The capacity scheduler is one possible scheduler plugin.

The CapacityScheduler provides:

  • a set of limits to ensure that a single application or user or queue cannot consume disproportionate amount of resources
  • limits on initialized and pending applications from a single user and queue

Model

Resources are defined by:

  • Queue and sub queue (hierarchical queues) where queue in the branch have higher priority than the other.
  • Predefined queue called root. All queues in the system are children of the root queue.
  • Queues are allocated a fraction of the capacity of the grid
  • applications submitted to a queue will have access to the capacity allocated to the queue. Administrators can configure soft limits and optional hard limits on the capacity allocated to each queue. A hard limit cannot be increased once it is set; a soft limit may be increased up to the value of the hard limit.
  • Queue ACL to submit, administrator role
  • Queue Mapping based on User or Group
  • Queue Priority (Higher integer value indicates higher priority for an application only for FIFO ordering policy).
  • Absolute vs Percentage Value Resource Configuration

Tree

The queue tree properties:

  • The hierarchy is defined through level in the path yarn.scheduler.capacity.<queue-path>. For instance: The a level (yarn.scheduler.capacity.root.a.queues) is a level below root (yarn.scheduler.capacity.root.queues)
  • Several queues can be defined by level.
  • Children do not inherit properties

Configuration Management

capacity-scheduler.xml

The configuration follows the hadoop configuration principles and are stored in the file capacity-scheduler.xml

Example with 3 levels (root, a and b) and 8 queues (a,b,c,a1,a2,b1,b2,b3)

<property>
  <name>yarn.scheduler.capacity.root.queues</name>
  <!-- yarn.scheduler.capacity.<queue-path> -->
  <value>a,b,c</value>
  <description>The queues at the this level (root is the root queue).
  </description>
</property>

<property>
  <name>yarn.scheduler.capacity.root.a.queues</name>
  <value>a1,a2</value>
  <description>The queues at the this level (root is the root queue).
  </description>
</property>

<property>
  <name>yarn.scheduler.capacity.root.b.queues</name>
  <value>b1,b2,b3</value>
  <description>The queues at the this level (root is the root queue).
  </description>
</property>

More Configuration

Modification

via Cli

Yarn - Yarn CLI

vi $HADOOP_CONF_DIR/capacity-scheduler.xml
$HADOOP_YARN_HOME/bin/yarn rmadmin -refreshQueues

via Rest

Documentation / Reference


Powered by ComboStrap