An edge node is a separate machine that isn’t used to store data or perform computation.
Prevent cluster crash
Many organizations submit jobs from the edge node. Since the edge node is separate from the cluster, it can go down without affecting the rest of the cluster.
Edge nodes are also used for data science work on aggregate data that has been retrieved from the cluster. For example, a data scientist might submit a Spark job from an edge node to transform a 10 TB dataset into a 1 GB aggregated dataset, and then do analytics on the edge node using tools like R and Python.