About
Sparkling Water provides H2O's fast scalable machine learning engine inside Spark cluster.
Sparkling Water is distributed as a Spark application library which can be used by any Spark application.
Articles Related
Demo
Management
Installation
Docker
See https://github.com/h2oai/sparkling-water/tree/master/docker
Local Installation
- Get the zip file at http://h2o-release.s3.amazonaws.com/sparkling-water/rel-[SW Major Version]/[SW Minor Version]/index.html (Example: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/17/index.html)
- Unzip it
unzip sparkling-water-2.2.17.zip
- Add the bin to the path
setx PATH=C:\sparkling-water-2.2.17\bin;%PATH%
- Spark Home must also be set
setx SPARK_HOME=/pathToSpark
- Set the master as environment variable. For instance, to launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node.
setx MASTER="local[*]"
Artifacts (Jar File)
- Sparkling_water_home/assembly/build/libs/sparkling-water-assembly_*.jar
Shell
The Sparkling shell encapsulates a regular Spark shell and append Sparkling Water library on the classpath via –jars option. The Sparkling Shell supports creation of an H2O cloud and execution of H2O algorithms.
sparkling-shell --conf "spark.executor.memory=1g"
- then context
import org.apache.spark.h2o._
val hc = H2OContext.getOrCreate(spark)
Documentation / Reference
- https://github.com/h2oai/sparkling-water - Good doc too