Spark - Collect

About

The collect action returns the elements of a map.

All data must fit in the driver program.

Function

collect()

The collect() action returns all of the elements of the RDD as an array (collection ?).

rdd = sc.parallelize([1, 2, 3]) 
rdd.collect() 
Value: [1,2,3] # as list

collectAsMap()

collectAsMap() return the key-value pairs in this RDD to the master as a dictionary.

>>> m = sc.parallelize([(1, 2), (3, 4)]).collectAsMap()
>>> m[1]
2
>>> m[3]
4

Powered by ComboStrap