RDD - Partition

Spark Pipeline

About

Spark Engine - Partition in RDD

Managememnt

set

  • parrallelize. (Example for two)
rdd = sc.parallelize([1, 2, 3, 4], 2)

get

rdd.getNumPartitions

mapPartitions

Return a new RDD by applying a function to each partition of this RDD.







Share this page:
Follow us:
Task Runner