About
How to install:
- the package sparklyr
- and a local spark instance with Livy for a dev environment
Articles Related
Installation
Package
# The repo for instance of microsoft
# options(repos = "https://mran.microsoft.com/snapshot/2017-05-01")
install.packages("sparklyr")
# Last version of Sparklyr
# devtools::install_github("rstudio/sparklyr")
- Load the library if needed
library(sparklyr)
library(dplyr)
Local Standalone Spark
See Spark - Standalone installation (spark scheme)
- Do we have already version installed ?
sparklyr::spark_installed_versions()
spark hadoop dir
1 1.6.2 2.6 spark-1.6.2-bin-hadoop2.6
2 2.1.1 2.7 spark-2.1.1-bin-hadoop2.7
- Check the available version (seems that you need to call spark_install first) with the following command:
sparklyr::spark_available_versions()
This is an output of the file “User_Home\Documents\R\win-library\3.3\sparklyr\extdata\install_spark.csv”.
You can tweak it to add version that you will find at https://d3kbcqa49mib13.cloudfront.net/
- Install the one that you want locally
sparklyr::spark_install(version = "1.6.2")
Installing Spark 1.6.2 for Hadoop 2.6 or later.
Downloading from:
- 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
Installing to:
- 'C:\Users\gerardn\AppData\Local\rstudio\spark\Cache/spark-1.6.2-bin-hadoop2.6'
trying URL 'https://d3kbcqa49mib13.cloudfront.net/spark-1.6.2-bin-hadoop2.6.tgz'
Content type 'application/x-tar' length 278057117 bytes (265.2 MB)
downloaded 265.2 MB
Installation complete.
- Restart RStudio and verify that you have the HADOOP_HOME
Sys.getenv("HADOOP_HOME")
[1] "C:\\Users\\gerardn\\AppData\\Local\\rstudio\\spark\\Cache\\spark-1.6.2-bin-hadoop2.6\\tmp\\hadoop"
Livy
- Install
sparklyr::livy_install(version = "0.3.0", spark_home = NULL, spark_version = NULL)
- Start / Stop
sparklyr::livy_service_start()
sparklyr::livy_service_stop()
- See also
sparklyr::livy_available_versions()
sparklyr::livy_install_dir()
sparklyr::livy_installed_versions()
sparklyr::livy_home_dir(version = NULL)