ML - SparklingWater (h20 inside Spark)

About

Sparkling Water provides H2O's fast scalable machine learning engine inside Spark cluster.

Sparkling Water is distributed as a Spark application library which can be used by any Spark application.

Articles Related

Demo

https://github.com/h2oai/sparkling-water/tree/master/examples

Management

Installation

Docker

See https://github.com/h2oai/sparkling-water/tree/master/docker

Local Installation

Install Spark locally
Get the zip file at http://h2o-release.s3.amazonaws.com/sparkling-water/rel-[SW Major Version]/[SW Minor Version]/index.html (Example: http://h2o-release.s3.amazonaws.com/sparkling-water/rel-2.2/17/index.html)
Unzip it

unzip sparkling-water-2.2.17.zip

Add the bin to the path

setx PATH=C:\sparkling-water-2.2.17\bin;%PATH%

Spark Home must also be set

setx SPARK_HOME=/pathToSpark

Set the master as environment variable. For instance, to launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node.

setx MASTER="local[*]"

Artifacts (Jar File)

Sparkling_water_home/assembly/build/libs/sparkling-water-assembly_*.jar

More see https://github.com/h2oai/sparkling-water#maven

Shell

The Sparkling shell encapsulates a regular Spark shell and append Sparkling Water library on the classpath via –jars option. The Sparkling Shell supports creation of an H2O cloud and execution of H2O algorithms.

sparkling-shell --conf "spark.executor.memory=1g"

then context

import org.apache.spark.h2o._
val hc = H2OContext.getOrCreate(spark)

Documentation / Reference

https://github.com/h2oai/sparkling-water - Good doc too