Apache - Hive (HS|Hive Server)


Hive is a relational database developed on top of Hadoop to deliver data warehouse functionality.

It uses SQL queries (HiveQL) to run MapReduce jobs on Hadoop.

The Hive driver converts the HiveQL queries to Yarn jobs, and then sends the jobs to the Hadoop cluster.

It was designed to appeal to a community comfortable with SQL. It's philosophy was that we don't need yet another scripting language.

Developers write scripts with the HiveQL interface. Scripts are then compiled into MapReduce jobs. Results are typically stored in a flat file format in the Hadoop distributed filesystem (HadoopFS)

Hive has a ODBC/JDBC driver.

Hive is a better tools for very long running, batch-oriented tasks such as ETL tasks.

With Hive you never point to a single file, you always point to a directory.

Hive derived product

Documentation / Reference

Task Runner