Spark - Collect
About
The collect action returns the elements of a map.
All data must fit in the driver program.
Articles Related
Function
collect()
The collect() action returns all of the elements of the RDD as an array (collection ?).
rdd = sc.parallelize([1, 2, 3])
rdd.collect()
Value: [1,2,3] # as list
collectAsMap()
collectAsMap() return the key-value pairs in this RDD to the master as a dictionary.
>>> m = sc.parallelize([(1, 2), (3, 4)]).collectAsMap()
>>> m[1]
2
>>> m[3]
4