Spark - SparkR API

About

SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports distributed machine learning using MLlib.

Articles Related

Management

Shell

Spark - Shell

./bin/sparkR shell

The shell create automatically a SparkSession (connection).

SparkDataFrame Creation

from:

structured data files,
tables in Hive,
external databases,
or existing local R data frames.

Session

SparkSession in R.

The SparkSession connects your R program to a Spark cluster.

You call sparkR.session and pass in options (such as the application name, any spark packages depended on, etc.).

From the shell, the SparkSession is already created.

sparkR.session()

Configuration

Driver

SPARKR_DRIVER_R is the configuration property that define the R binary executable to use for SparkR shell (default is R). Property spark.r.shell.command take precedence if it is set.

Installation

not in CRAN (https://cran.r-project.org/web/packages/SparkR/index.html)
The installation from the Spark distribution doesn't work. (https://github.com/r-lib/devtools/issues/1191)

Installation for the version 2.2.0

if (!require('devtools')) install.packages('devtools')
devtools::install_github('apache/[email protected]', subdir='R/pkg')

Downloading GitHub repo apache/[email protected]
from URL https://api.github.com/repos/apache/spark/zipball/v2.2.0
Installing SparkR
"C:/R/R-3.5.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore  \
  --quiet CMD INSTALL  \
  "C:/Users/gerard/AppData/Local/Temp/RtmpYTWtef/devtools64e063bc40c5/apache-spark-a2c7b21/R/pkg"  \
  --library="C:/R/R-3.5.0/library" --install-tests

In R CMD INSTALL
* installing *source* package 'SparkR' ...
** R
** inst
** tests
** byte-compile and prepare package for lazy loading
Creating a new generic function for 'as.data.frame' in package 'SparkR'
Creating a new generic function for 'colnames' in package 'SparkR'
Creating a new generic function for 'colnames<-' in package 'SparkR'
Creating a new generic function for 'cov' in package 'SparkR'
Creating a new generic function for 'drop' in package 'SparkR'
Creating a new generic function for 'na.omit' in package 'SparkR'
Creating a new generic function for 'filter' in package 'SparkR'
Creating a new generic function for 'intersect' in package 'SparkR'
Creating a new generic function for 'sample' in package 'SparkR'
Creating a new generic function for 'transform' in package 'SparkR'
Creating a new generic function for 'subset' in package 'SparkR'
Creating a new generic function for 'summary' in package 'SparkR'
Creating a new generic function for 'union' in package 'SparkR'
Creating a new generic function for 'endsWith' in package 'SparkR'
Creating a new generic function for 'startsWith' in package 'SparkR'
Creating a new generic function for 'lag' in package 'SparkR'
Creating a new generic function for 'rank' in package 'SparkR'
Creating a new generic function for 'sd' in package 'SparkR'
Creating a new generic function for 'var' in package 'SparkR'
Creating a new generic function for 'window' in package 'SparkR'
Creating a new generic function for 'predict' in package 'SparkR'
Creating a new generic function for 'rbind' in package 'SparkR'
Creating a generic function for 'substr' from package 'base' in package 'SparkR'
Creating a generic function for '%in%' from package 'base' in package 'SparkR'
Creating a generic function for 'lapply' from package 'base' in package 'SparkR'
Creating a generic function for 'Filter' from package 'base' in package 'SparkR'
Creating a generic function for 'nrow' from package 'base' in package 'SparkR'
Creating a generic function for 'ncol' from package 'base' in package 'SparkR'
Creating a generic function for 'factorial' from package 'base' in package 'SparkR'
Creating a generic function for 'atan2' from package 'base' in package 'SparkR'
Creating a generic function for 'ifelse' from package 'base' in package 'SparkR'
** help
No man pages found in package  'SparkR'
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (SparkR)