RDD - Pipe

Spark Pipeline

About

pipe is a transformation

pipe return an RDD created by piping elements to a forked external process.

Snippet

pipe(command, env={})

Example with a Spark RDD - Spark Context (sc, sparkContext)

sc = spark.sparkContext
sc.parallelize(['1', '2', '', '3']).pipe('cat').collect()
[u'1', u'2', u'', u'3']





Discover More
Spark Pipeline
RDD - Calling a Worker (Local|External) Process

How to call a (forked) external process from Spark Example with a
Spark Pipeline
Spark - (RDD) Transformation

transformation function in RDD Transformations Description filter returns a new data set that's formed by selecting those elements of the source on which a function returns true. distinct([numTasks]))...



Share this page:
Follow us:
Task Runner