About
cluster in Hadoop.
An cluster is group of process (generally one machine per process) called node where you will find two kind of nodes:
- headnodes that hosts the main process
- worker nodes that hosts the agent of the main process
Additionally, you will have certainly edge nodes that hosts the client application (no services).
Management
Installation
- To start a Hadoop cluster you will need to start both the HDFS and YARN cluster. More … Hadoop - Cluster Installation
- See also Single Node (mini cluster) creation
Capacity
Computer - Capacity Planning (Sizing) for a cluster
It depends mostly on how Hadoop is used.
- Storage - Disk- with the default File System and with the default block size. Example for 10TB
<MATH> \text{Permanent Storage} = 2 . \text{Data Size} = 2 . 10TB = 20 TB \\ \text{Blocks} = 2 . \frac{\text{Data Size}}{\text{block Size}} = 2 . \frac{10000000}{128} = 156,250 </MATH>
- Temp Storage: The output size for the Map function (will be deleted at the end of shuffle) + the output size of the Reduce function. If the Map output won't be much larger than the input …
<MATH> \text{Temporary Storage} = 2 . \text{Data Size} = 2 . 10TB = 20 TB \\ </MATH>
- Memory- the amount of memory depends on:
- the memory of each containers (AM, Mappers, and Reducers), (Each mapper should have at least 2056 MB because the task assignment depends mostly on the Memory)
- the number of containers
- Cpu - the same as Memory
- Heap Size