Hadoop - Sqoop
About
Sqoop is designed to:
- import tables from a database into HDFS.
- export HDFS data into a database
Sqoop is a Hadoop command line program to (process/transfer) data between:
- structured (generally relational databases),
- semi-structured (Cassandra, Hbase, …)
- unstructured (File System - HDFS)
data sources through MapReduce programs.
You can use Sqoop to import and export data.
Sqoop is a collection of related tools.
Articles Related
Architecture
The sqoop command-line program is a wrapper which runs the bin/hadoop script shipped with Hadoop.
Installation / Configuration
################
## Binary location
################
export HADOOP_COMMON_HOME=/path/to/some/hadoop \
export HADOOP_MAPRED_HOME=/path/to/some/hadoop-mapreduce \
# if HADOOP_COMMON_HOME and HADOOP_MAPRED_HOME are not used
export HADOOP_HOME=
# if HADOOP_HOME is not set default installation locations for Apache Bigtop,
# * /usr/lib/hadoop
# * and /usr/lib/hadoop-mapreduce
################
## Configuration location
################
# default value
export HADOOP_CONF_DIR=$HADOOP_HOME/conf/
################
## Run
################
sqoop import --arguments...
Management
Configuration
To use Sqoop, you must configure Sqoop properties in a JDBC connection and run the mapping in the Hadoop environment.
Location
Users of a packaged deployment of Sqoop (such as an RPM shipped with Apache Bigtop) will see this program installed in.
/usr/bin/
Tool
Sqoop is a collection of related tools.
- Sqoop - sqoop Cli - The main tool that calls the other one.
- sqoop-codegen
- sqoop-create-hive-table
- sqoop-eval - primitive SQL execution shell
- sqoop-export
- sqoop-help
- sqoop-import
- sqoop-import-all-tables - - list the available tables within a schema
- sqoop-job
- sqoop-list-databases - list the available database schemas
- sqoop-list-tables
- sqoop-merge
- sqoop-metastore
- sqoop-version