Hadoop - Sqoop

About

Sqoop is designed to:

  • import tables from a database into HDFS.
  • export HDFS data into a database

Sqoop is a Hadoop command line program to (process/transfer) data between:

data sources through MapReduce programs.

You can use Sqoop to import and export data.

Sqoop is a collection of related tools.

Architecture

The sqoop command-line program is a wrapper which runs the bin/hadoop script shipped with Hadoop.

Installation / Configuration

################
## Binary location
################
export HADOOP_COMMON_HOME=/path/to/some/hadoop \
export HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \
# if HADOOP_COMMON_HOME and HADOOP_MAPRED_HOME are not used
export HADOOP_HOME=
# if  HADOOP_HOME is not set default installation locations for Apache Bigtop, 
#    * /usr/lib/hadoop 
#    * and /usr/lib/hadoop-mapreduce

################
## Configuration location
################
# default value 
export HADOOP_CONF_DIR=$HADOOP_HOME/conf/


################
## Run
################
sqoop import --arguments...

Management

Configuration

To use Sqoop, you must configure Sqoop properties in a JDBC connection and run the mapping in the Hadoop environment.

Location

Users of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop) will see this program installed in.

/usr/bin/

Tool

Sqoop is a collection of related tools.

  • Sqoop - sqoop Cli - The main tool that calls the other one.
  • sqoop-codegen
  • sqoop-create-hive-table
  • sqoop-eval - primitive SQL execution shell
  • sqoop-export
  • sqoop-help
  • sqoop-import
  • sqoop-import-all-tables - - list the available tables within a schema
  • sqoop-job
  • sqoop-list-databases - list the available database schemas
  • sqoop-list-tables
  • sqoop-merge
  • sqoop-metastore
  • sqoop-version

Run / Job

Documentation / Reference

Task Runner