Spark - (Random) Split

1 - About

3 - Function

3.1 - randomSplit

randomSplit randomly splits a RDD with the provided weights.


randomSplit(weights, seed=None)

where:

  • weights – weights for splits, will be normalized if they don’t sum to 1
  • seed – random seed

Example of percentage split


weights = [.8, .1, .1]
seed = 42 # seed=0L
# Use randomSplit with weights and seed
rawTrainData, rawValidationData, rawTestData = rawData.randomSplit(weights, seed)

The exact number of entries in each dataset varies slightly due to the random nature of the randomSplit() transformation.


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap