Hadoop - Edge node

Yarn Hortonworks


An edge node is a node with the same client tools installed and configured as in the headnodes, but with no Hadoop services running.

An edge node is a separate machine that isn’t used to store data or perform computation.


Prevent cluster crash

Many organizations submit jobs from the edge node. Since the edge node is separate from the cluster, it can go down without affecting the rest of the cluster.

For instance, a poorly written Spark program can accidentally try to bring back many Terabytes of data to the driver machine, causing it to crash. For instance with the take function.


Edge nodes are also used for data science work on aggregate data that has been retrieved from the cluster. For example, a data scientist might submit a Spark job from an edge node to transform a 10 TB dataset into a 1 GB aggregated dataset, and then do analytics on the edge node using tools like R and Python.

Documentation / Reference

Discover More
Card Puncher Data Processing
Azure - HDInsight (Microsoft's Hadoop)

Azure HDInsight is a cluster distribution of the Hadoop components from the Hortonworks Data Platform (HDP). duplicate of It regroups open-source frameworks:...
Yarn Hortonworks
Hadoop - Cluster

cluster in Hadoop. An cluster is group of process (generally one machine per process) called node where you will find two kind of nodes: headnodes that hosts the main process worker nodes that hosts...
Card Puncher Data Processing
MapR - Client

The MapR client includes the libraries and utilities required on an edge node to perform the following: connect to the cluster, submit MapReduce applications, submit YARN applications, run...
Card Puncher Data Processing
Spark - Spark-submit

The spark submit application to submit application. The spark-submit script is used to launch applications on a cluster. Spark jobs are generally submitted from an edge node where: class is...

Share this page:
Follow us:
Task Runner