Spark - (Random) Split

1 - About

3 - Function

3.1 - randomSplit

randomSplit randomly splits a RDD with the provided weights.

randomSplit(weights, seed=None)


  • weights – weights for splits, will be normalized if they don’t sum to 1
  • seed – random seed

Example of percentage split

weights = [.8, .1, .1]
seed = 42 # seed=0L
# Use randomSplit with weights and seed
rawTrainData, rawValidationData, rawTestData = rawData.randomSplit(weights, seed)

The exact number of entries in each dataset varies slightly due to the random nature of the randomSplit() transformation.

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap