About
How to load data into Hive Table.
Options
This is are the following possibilities:
- File System operation
- SQL Operations
- Third party tools
File System
Replace the file on HDFS when the input data format is the same as the table format with:
- load statement without the INPUTFORMAT for loading local file (local to HDFS) - See Loadingfilesintotables
- an HDFS client such as
Third party tools
- Ambari: Ambari Admin > HiveView > Load data
SQL operations
INSERT can insert into a Hive table, local directory or HDFS directory
- insert as select - InsertingdataintoHiveTablesfromqueries
- insert value or update value statement (slow) - InsertingvaluesintotablesfromSQL or LanguageManualDML-Update
- insert overwrite
- load statement since version 3.0 as an input format that will process the data (not just move the data file but also the row)
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde']
- create as select
See also external table partition_load
Multi table insert
FROM S
INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr)
SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10
INSERT OVERWRITE TABLE R PARTITION (ds='2010-03-03, hr=12)
SELECT key, value, ds, hr from srcpart where ds is not null and hr = 12;
export/import between cluster
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport - The EXPORT command exports the data of a table or partition, along with the metadata, into a specified output location. This output location can then be moved over to a different Hadoop or Hive instance and imported from there with the IMPORT command.
Example
local = not in HDFS
load data local inpath '" + filepath + "' into table " + tableName