Hadoop - Cluster

Yarn Hortonworks


cluster in Hadoop.

An cluster is group of process (generally one machine per process) called node where you will find two kind of nodes:

Additionally, you will have certainly edge nodes that hosts the client application (no services).

A minimal hadoop cluster needs HDFS and YARN.




Computer - Capacity Planning (Sizing) for a cluster

It depends mostly on how Hadoop is used.

  • Storage - Disk- with the default File System and with the default block size. Example for 10TB

<MATH> \text{Permanent Storage} = 2 . \text{Data Size} = 2 . 10TB = 20 TB \\ \text{Blocks} = 2 . \frac{\text{Data Size}}{\text{block Size}} = 2 . \frac{10000000}{128} = 156,250 </MATH>

  • Temp Storage: The output size for the Map function (will be deleted at the end of shuffle) + the output size of the Reduce function. If the Map output won't be much larger than the input …

<MATH> \text{Temporary Storage} = 2 . \text{Data Size} = 2 . 10TB = 20 TB \\ </MATH>

Discover More
Card Puncher Data Processing
Apache - Hive (HS|Hive Server)

Hive is a relational database developed on top of Hadoop to deliver data warehouse functionality. It uses SQL queries (HiveQL) to run MapReduce jobs on Hadoop. The Hive driver converts the HiveQL queries...
Card Puncher Data Processing
Aws - Elastic Map Reduce (Emr)

Emr is the Hadoop Cluster offering of Aws
Card Puncher Data Processing
Docker - Swarm (Cluster)

Swarm is an distributed app scheduler service. A swarm consists of a cluster of Docker hosts (running in swarm mode) which act as: managers (to manage membership and delegation) workers (which run...
Yarn Hortonworks
Hadoop - Benchmark

MapReduce workload. GridMix is a benchmark for Hadoop clusters. It works from a MapReduce job trace describing the workload. Such traces...
Yarn Hortonworks
Hadoop - Cluster Installation

Manual installation of an hadoop cluster. To start a Hadoop cluster you will need to start both the HDFS and YARN cluster. More ......
Yarn Hortonworks
Hadoop - Head Node (Master node)

A head node is a node on a hadoop cluster that runs the (head|lead) services that manages the worker nodes. For example, the following process would be installed on a head node: For hdfs, the namenode...
Yarn Hortonworks
Hadoop - Node

A node is a physical computer that logically run in a cluster
Yarn Hortonworks
Hadoop - Single-node cluster

Single-node Hadoop cluster Automatic with mini cluster: mini cluster Manually:
Yarn Hortonworks
Hadoop - Worker Node

The worker node is a node in an hadoop cluster that host the (worker|agent) process (in contrast with the head node that hosts the leader services). For example, the following process would be installed...
Yarn Hortonworks
Hue - Server

Hue consists of a web service that runs on a special node in your cluster. This node is known as the Hue Server. Hue can be installed on any node. For small clusters of less than 10 nodes, an existing...

Share this page:
Follow us:
Task Runner