About
SparkR provides a distributed data frame implementation that supports operations like selection, filtering, aggregation etc. (similar to R data frames, dplyr) but on large datasets. SparkR also supports distributed machine learning using MLlib.
Articles Related
Management
Shell
./bin/sparkR shell
The shell create automatically a SparkSession (connection).
SparkDataFrame Creation
from:
- structured data files,
- tables in Hive,
- external databases,
- or existing local R data frames.
Session
SparkSession in R.
The SparkSession connects your R program to a Spark cluster.
You call sparkR.session and pass in options (such as the application name, any spark packages depended on, etc.).
From the shell, the SparkSession is already created.
sparkR.session()
Configuration
Driver
SPARKR_DRIVER_R is the configuration property that define the R binary executable to use for SparkR shell (default is R). Property spark.r.shell.command take precedence if it is set.
Installation
- The installation from the Spark distribution doesn't work. (https://github.com/r-lib/devtools/issues/1191)
- Installation for the version 2.2.0
if (!require('devtools')) install.packages('devtools')
devtools::install_github('apache/[email protected]', subdir='R/pkg')
Downloading GitHub repo apache/[email protected]
from URL https://api.github.com/repos/apache/spark/zipball/v2.2.0
Installing SparkR
"C:/R/R-3.5.0/bin/x64/R" --no-site-file --no-environ --no-save --no-restore \
--quiet CMD INSTALL \
"C:/Users/gerard/AppData/Local/Temp/RtmpYTWtef/devtools64e063bc40c5/apache-spark-a2c7b21/R/pkg" \
--library="C:/R/R-3.5.0/library" --install-tests
In R CMD INSTALL
* installing *source* package 'SparkR' ...
** R
** inst
** tests
** byte-compile and prepare package for lazy loading
Creating a new generic function for 'as.data.frame' in package 'SparkR'
Creating a new generic function for 'colnames' in package 'SparkR'
Creating a new generic function for 'colnames<-' in package 'SparkR'
Creating a new generic function for 'cov' in package 'SparkR'
Creating a new generic function for 'drop' in package 'SparkR'
Creating a new generic function for 'na.omit' in package 'SparkR'
Creating a new generic function for 'filter' in package 'SparkR'
Creating a new generic function for 'intersect' in package 'SparkR'
Creating a new generic function for 'sample' in package 'SparkR'
Creating a new generic function for 'transform' in package 'SparkR'
Creating a new generic function for 'subset' in package 'SparkR'
Creating a new generic function for 'summary' in package 'SparkR'
Creating a new generic function for 'union' in package 'SparkR'
Creating a new generic function for 'endsWith' in package 'SparkR'
Creating a new generic function for 'startsWith' in package 'SparkR'
Creating a new generic function for 'lag' in package 'SparkR'
Creating a new generic function for 'rank' in package 'SparkR'
Creating a new generic function for 'sd' in package 'SparkR'
Creating a new generic function for 'var' in package 'SparkR'
Creating a new generic function for 'window' in package 'SparkR'
Creating a new generic function for 'predict' in package 'SparkR'
Creating a new generic function for 'rbind' in package 'SparkR'
Creating a generic function for 'substr' from package 'base' in package 'SparkR'
Creating a generic function for '%in%' from package 'base' in package 'SparkR'
Creating a generic function for 'lapply' from package 'base' in package 'SparkR'
Creating a generic function for 'Filter' from package 'base' in package 'SparkR'
Creating a generic function for 'nrow' from package 'base' in package 'SparkR'
Creating a generic function for 'ncol' from package 'base' in package 'SparkR'
Creating a generic function for 'factorial' from package 'base' in package 'SparkR'
Creating a generic function for 'atan2' from package 'base' in package 'SparkR'
Creating a generic function for 'ifelse' from package 'base' in package 'SparkR'
** help
No man pages found in package 'SparkR'
*** installing help indices
** building package indices
** installing vignettes
** testing if installed package can be loaded
*** arch - i386
*** arch - x64
* DONE (SparkR)