Spark - SparkR API

Card Puncher Data Processing

About

SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports distributed machine learning using MLlib.

Management

Shell

Spark - Shell

./bin/sparkR shell

The shell create automatically a SparkSession (connection).

SparkDataFrame Creation

from:

  • structured data files,
  • tables in Hive,
  • external databases,
  • or existing local R data frames.

Session

SparkSession in R.

The SparkSession connects your R program to a Spark cluster.

You call sparkR.session and pass in options (such as the application name, any spark packages depended on, etc.).

From the shell, the SparkSession is already created.

sparkR.session()

Configuration

Driver

SPARKR_DRIVER_R is the configuration property that define the R binary executable to use for SparkR shell (default is R). Property spark.r.shell.command take precedence if it is set.

Installation

  • Installation for the version 2.2.0
if (!require('devtools')) install.packages('devtools')
devtools::install_github('apache/[email protected]', subdir='R/pkg')
Downloading GitHub repo apache/[email protected]
from URL https://api.github.com/repos/apache/spark/zipball/v2.2.0
Installing SparkR
"C:/R/R-3.5.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore  \
  --quiet CMD INSTALL  \
  "C:/Users/gerard/AppData/Local/Temp/RtmpYTWtef/devtools64e063bc40c5/apache-spark-a2c7b21/R/pkg"  \
  --library="C:/R/R-3.5.0/library" --install-tests

In R CMD INSTALL
* installing *source* package 'SparkR' ...
** R
** inst
** tests
** byte-compile and prepare package for lazy loading
Creating a new generic function for 'as.data.frame' in package 'SparkR'
Creating a new generic function for 'colnames' in package 'SparkR'
Creating a new generic function for 'colnames<-' in package 'SparkR'
Creating a new generic function for 'cov' in package 'SparkR'
Creating a new generic function for 'drop' in package 'SparkR'
Creating a new generic function for 'na.omit' in package 'SparkR'
Creating a new generic function for 'filter' in package 'SparkR'
Creating a new generic function for 'intersect' in package 'SparkR'
Creating a new generic function for 'sample' in package 'SparkR'
Creating a new generic function for 'transform' in package 'SparkR'
Creating a new generic function for 'subset' in package 'SparkR'
Creating a new generic function for 'summary' in package 'SparkR'
Creating a new generic function for 'union' in package 'SparkR'
Creating a new generic function for 'endsWith' in package 'SparkR'
Creating a new generic function for 'startsWith' in package 'SparkR'
Creating a new generic function for 'lag' in package 'SparkR'
Creating a new generic function for 'rank' in package 'SparkR'
Creating a new generic function for 'sd' in package 'SparkR'
Creating a new generic function for 'var' in package 'SparkR'
Creating a new generic function for 'window' in package 'SparkR'
Creating a new generic function for 'predict' in package 'SparkR'
Creating a new generic function for 'rbind' in package 'SparkR'
Creating a generic function for 'substr' from package 'base' in package 'SparkR'
Creating a generic function for '%in%' from package 'base' in package 'SparkR'
Creating a generic function for 'lapply' from package 'base' in package 'SparkR'
Creating a generic function for 'Filter' from package 'base' in package 'SparkR'
Creating a generic function for 'nrow' from package 'base' in package 'SparkR'
Creating a generic function for 'ncol' from package 'base' in package 'SparkR'
Creating a generic function for 'factorial' from package 'base' in package 'SparkR'
Creating a generic function for 'atan2' from package 'base' in package 'SparkR'
Creating a generic function for 'ifelse' from package 'base' in package 'SparkR'
** help
No man pages found in package  'SparkR'
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (SparkR)

Documentation / Reference





Discover More
Card Puncher Data Processing
Spark - R

from R Studio. Use a local spark install. SparkR from Spark
Card Puncher Data Processing
Spark - Shell

The shell in different language. A shell create normally automatically a context (connection, session) given the name sc. Example of path: /usr/local/bin/spark-1.3.1-bin-hadoop2.6/bin/ ...



Share this page:
Follow us:
Task Runner