ML - SparklingWater (h20 inside Spark)

Data Mining Tool 2

About

Sparkling Water provides H2O's fast scalable machine learning engine inside Spark cluster.

Sparkling Water is distributed as a Spark application library which can be used by any Spark application.

Demo

Management

Installation

Docker

See https://github.com/h2oai/sparkling-water/tree/master/docker

Local Installation

unzip sparkling-water-2.2.17.zip
  • Add the bin to the path
setx PATH=C:\sparkling-water-2.2.17\bin;%PATH%
setx SPARK_HOME=/pathToSpark
  • Set the master as environment variable. For instance, to launch a local Spark cluster with 3 worker nodes with 2 cores and 1g per node.
setx MASTER="local[*]" 

Artifacts (Jar File)

  • Sparkling_water_home/assembly/build/libs/sparkling-water-assembly_*.jar

More see https://github.com/h2oai/sparkling-water#maven

Shell

The Sparkling shell encapsulates a regular Spark shell and append Sparkling Water library on the classpath via –jars option. The Sparkling Shell supports creation of an H2O cloud and execution of H2O algorithms.

sparkling-shell --conf "spark.executor.memory=1g"
  • then context
import org.apache.spark.h2o._
val hc = H2OContext.getOrCreate(spark)

Documentation / Reference





Discover More
Data Mining Tool 2
H2o

H2O is an open source, in-memory, distributed, fast, and scalable machine learning and predictive analytics platform that allows you to build machine learning models on big data and provides easy productionalization...
Card Puncher Data Processing
Spark - Library

Spark application library. Example: See also: ?? SQL and DataFrames, MLlib for machine learning, GraphX, Spark Streaming.
Card Puncher Data Processing
Spark - Shell

The shell in different language. A shell create normally automatically a context (connection, session) given the name sc. Example of path: /usr/local/bin/spark-1.3.1-bin-hadoop2.6/bin/ ...



Share this page:
Follow us:
Task Runner