SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports distributed machine learning using MLlib.
./bin/sparkR shell
The shell create automatically a SparkSession (connection).
from:
SparkSession in R.
The SparkSession connects your R program to a Spark cluster.
You call sparkR.session and pass in options (such as the application name, any spark packages depended on, etc.).
From the shell, the SparkSession is already created.
sparkR.session()
SPARKR_DRIVER_R is the configuration property that define the R binary executable to use for SparkR shell (default is R). Property spark.r.shell.command take precedence if it is set.
if (!require('devtools')) install.packages('devtools')
devtools::install_github('apache/[email protected]', subdir='R/pkg')
Downloading GitHub repo apache/[email protected]
from URL https://api.github.com/repos/apache/spark/zipball/v2.2.0
Installing SparkR
"C:/R/R-3.5.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore \
--quiet CMD INSTALL \
"C:/Users/gerard/AppData/Local/Temp/RtmpYTWtef/devtools64e063bc40c5/apache-spark-a2c7b21/R/pkg" \
--library="C:/R/R-3.5.0/library" --install-tests
In R CMD INSTALL
* installing *source* package 'SparkR' ...
** R
** inst
** tests
** byte-compile and prepare package for lazy loading
Creating a new generic function for 'as.data.frame' in package 'SparkR'
Creating a new generic function for 'colnames' in package 'SparkR'
Creating a new generic function for 'colnames<-' in package 'SparkR'
Creating a new generic function for 'cov' in package 'SparkR'
Creating a new generic function for 'drop' in package 'SparkR'
Creating a new generic function for 'na.omit' in package 'SparkR'
Creating a new generic function for 'filter' in package 'SparkR'
Creating a new generic function for 'intersect' in package 'SparkR'
Creating a new generic function for 'sample' in package 'SparkR'
Creating a new generic function for 'transform' in package 'SparkR'
Creating a new generic function for 'subset' in package 'SparkR'
Creating a new generic function for 'summary' in package 'SparkR'
Creating a new generic function for 'union' in package 'SparkR'
Creating a new generic function for 'endsWith' in package 'SparkR'
Creating a new generic function for 'startsWith' in package 'SparkR'
Creating a new generic function for 'lag' in package 'SparkR'
Creating a new generic function for 'rank' in package 'SparkR'
Creating a new generic function for 'sd' in package 'SparkR'
Creating a new generic function for 'var' in package 'SparkR'
Creating a new generic function for 'window' in package 'SparkR'
Creating a new generic function for 'predict' in package 'SparkR'
Creating a new generic function for 'rbind' in package 'SparkR'
Creating a generic function for 'substr' from package 'base' in package 'SparkR'
Creating a generic function for '%in%' from package 'base' in package 'SparkR'
Creating a generic function for 'lapply' from package 'base' in package 'SparkR'
Creating a generic function for 'Filter' from package 'base' in package 'SparkR'
Creating a generic function for 'nrow' from package 'base' in package 'SparkR'
Creating a generic function for 'ncol' from package 'base' in package 'SparkR'
Creating a generic function for 'factorial' from package 'base' in package 'SparkR'
Creating a generic function for 'atan2' from package 'base' in package 'SparkR'
Creating a generic function for 'ifelse' from package 'base' in package 'SparkR'
** help
No man pages found in package 'SparkR'
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (SparkR)