db:hadoop:yarn:fair

About

The fair scheduler is a scheduler plugin that fairly distributes an equal share of resources for jobs in the YARN cluster.

Articles Related

Concept

When there is a single app running, that app uses the entire cluster. When other apps are submitted, resources that free up are assigned to the new apps following the following rule:

Each apps app gets roughly the same amount of resources over time (known as equal share).

Unlike the default Hadoop scheduler, which forms a real queue of apps, this lets short apps finish in reasonable time while not starving long-lived apps.

Queue

Queue is the unit of resource fairness. They are used to;

group apps into queues
and shares resources fairly between them.

By default, all users share a single queue, named default.

If an app specifically lists a queue in a container resource request, the request is submitted to that queue.

Queue assignment / placement

Queues are assigned:

at request level (based on the queue name or the user and group name)
at a configuration level

based on the policy

Hierarchy of queue

Queues can be arranged in a hierarchy to divide resources and configured with weights to share the cluster in specific proportions.

All queues descend from a queue named root. Available resources are distributed from the root queue recursively among the children. Applications may only be scheduled on leaf queues.

Queues can be specified as children of other queues by placing them as sub-elements of their parents in the fair scheduler allocation file.

Queue Name

A queue’s name starts with the names of its parents, with periods as separators.

[root].parent.....children

where the root part of the name is optional.

Example:

root.queue1 (or queue1) is the queue under the root queue
root.parent1.queue2 (or parent1.queue2)

Constraint

minResources

Minimum Share.

The Fair Scheduler allows assigning minimum shares to queues (also known as guaranteed share). This is useful for ensuring that production applications always get sufficient resources.

When a queue contains apps, it gets at least its minimum share, but when the queue does not need it, the excess is split between other running apps.

maxResources

maxRunningApps

The Max number of running app.

The Fair Scheduler lets all apps run by default, but it is also possible to limit the number of running apps per user and per queue.

Limited apps will wait in the scheduler’s queue until the actual running apps finish.

Policy

Within each queue, a scheduling policy is used to share resources between the running apps.

The Fair Scheduler bases scheduling fairness decisions can be:

only on memory (Default) also called memory-based fair sharing
or on both memory and CPU (using the notion of Dominant Resource Fairness developed by Ghodsi et al). (knwon as multi-resource with Dominant Resource Fairness ??)
or FIFO

Policies extends org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.SchedulingPolicy. Built-in:

FifoPolicy,
FairSharePolicy (default),
and DominantResourceFairnessPolicy

Rule

A policy consists of a set of rules that are applied sequentially to classify an incoming application. Each rule either places the app into a queue, rejects it, or continues on to the next rule.

Priority

app priorities are used as weights to determine the fraction of total resources that each app should get.

Properties

Used Resources - The sum of resources allocated to containers within the queue.
Num Active Applications - The number of applications in the queue that have received at least one container.
Num Pending Applications - The number of applications in the queue that have not yet received any containers.
Min Resources - The configured minimum resources that are guaranteed to the queue.
Max Resources - The configured maximum resources that are allowed to the queue.
Instantaneous Fair Share - The queue’s instantaneous fair share of resources. These shares consider only actives queues (those with running applications), and are used for scheduling decisions. Queues may be allocated resources beyond their shares when other queues aren’t using them. A queue whose resource consumption lies at or below its instantaneous fair share will never have its containers preempted.
Steady Fair Share - The queue’s steady fair share of resources. These shares consider all the queues irrespective of whether they are active (have running applications) or not. These are computed less frequently and change only when the configuration or capacity changes.They are meant to provide visibility into resources the user can expect, and hence displayed in the Web UI.

Management

Installation

in the conf file

<property>
  <name>yarn.resourcemanager.scheduler.class</name>
  <value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler</value>
</property>

Configuration

Two files for two scope:

at process start: the yarn-site.xml
on the fly: the allocation file (reloaded every 10 seconds). It lists which queues exist and their respective weights and capacities.

Resource Reservation

The reservation system in the context of the fair scheduler.

It's not enabled by default. The application can request reserved resources at runtime by specifying the reservationId during submission.

See:

UI

yarnUi/cluster/scheduler

Log

Fair Scheduler is able to dump its state periodically. It is disabled by default. The administrator can enable it by setting org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler.statedump logging level to DEBUG.

Fair Scheduler state dumps can potentially generate a large amount of log data.

Fair Scheduler logs go to the Resource Manager log file by default. Uncomment the “Fair scheduler state dump” section in log4j.properties to dump the state into a separate file.

Documentation / Reference

FairScheduler doc

Table of Contents

Yarn - Fair Scheduler