Hive - Load data

Card Puncher Data Processing

About

How to load data into Hive Table.

Options

This is are the following possibilities:

  • File System operation
  • SQL Operations
  • Third party tools

File System

Replace the file on HDFS when the input data format is the same as the table format with:

  • load statement without the INPUTFORMAT for loading local file (local to HDFS) - See Loadingfilesintotables
  • an HDFS client such as
    • WebHDFS: A Web URL provides an Upload end-point into a designated HDFS folder.
    • NFS mount

Third party tools

SQL operations

INSERT can insert into a Hive table, local directory or HDFS directory

LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)] [INPUTFORMAT 'inputformat' SERDE 'serde'] 
  • create as select

See also external table partition_load

Multi table insert

FROM S
INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr)
SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10
INSERT OVERWRITE TABLE R PARTITION (ds='2010-03-03, hr=12)
SELECT key, value, ds, hr from srcpart where ds is not null and hr = 12;

export/import between cluster

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+ImportExport - The EXPORT command exports the data of a table or partition, along with the metadata, into a specified output location. This output location can then be moved over to a different Hadoop or Hive instance and imported from there with the IMPORT command.

Example

local = not in HDFS

load data local inpath '" + filepath + "' into table " + tableName





Discover More
Card Puncher Data Processing
Hive - External Table

external table are external because the data is stored outside the data warehouse. Use external tables when: The data is also used outside of Hive. For example, the data files are updated by another...



Share this page:
Follow us:
Task Runner