ML - SparklingWater (h20 inside Spark)

About

Sparkling Water provides H2O's fast scalable machine learning engine inside Spark cluster.

Sparkling Water is distributed as a Spark application library which can be used by any Spark application.

Demo

Management

Installation

Docker

Local Installation

unzip sparkling-water-2.2.17.zip
  • Add the bin to the path
setx PATH=C:\sparkling-water-2.2.17\bin;%PATH%
setx SPARK_HOME=/pathToSpark
  • Set the master as environment variable. For instance, to launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node.
setx MASTER="local[*]" 

Artifacts (Jar File)

  • Sparkling_water_home/assembly/build/libs/sparkling-water-assembly_*.jar

More see https://github.com/h2oai/sparkling-water#maven

Shell

The Sparkling shell encapsulates a regular Spark shell and append Sparkling Water library on the classpath via –jars option. The Sparkling Shell supports creation of an H2O cloud and execution of H2O algorithms.

sparkling-shell --conf "spark.executor.memory=1g"
  • then context
import org.apache.spark.h2o._
val hc = H2OContext.getOrCreate(spark)

Documentation / Reference


Powered by ComboStrap