About
The Csv Serde is a serde that is applied above a text file. It's one way of reading a CSV / TSV format.
Articles Related
Architecture
- The CSVSerde is available in Hive 0.14 and greater.
- It was added to the Hive distribution in HIVE-7777.
- The CSVSerde has been built and tested against Hive 0.14 and later, and uses Open-CSV 2.3 which is bundled with the Hive distribution.
Limitations
This SerDe:
- does not handle embedded newlines.
- treats all columns to be of type String. Even if you create a table with non-string column types using this SerDe, the DESCRIBE TABLE output would show string column type. To convert columns to the desired type in a table, you can create a view over the table that does the CAST to the desired type.
Syntax
The following example creates a TSV (Tab-separated) file.
CREATE TABLE my_table(a string, b string, ...)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "\t",
"quoteChar" = "'",
"escapeChar" = "\\"
)
STORED AS TEXTFILE;
where:
- the fully qualified class name org.apache.hadoop.hive.serde2.OpenCSVSerde must be specified.
- the default properties value are
DEFAULT_ESCAPE_CHARACTER \
DEFAULT_QUOTE_CHARACTER "
DEFAULT_SEPARATOR ,
Example
With the tpcds customer table
Step 1 - Create the staging external table
CREATE EXTERNAL TABLE `customer_dat`(
`c_customer_sk` int,
`c_customer_id` char(16),
`c_current_cdemo_sk` int,
`c_current_hdemo_sk` int,
`c_current_addr_sk` int,
`c_first_shipto_date_sk` int,
`c_first_sales_date_sk` int,
`c_salutation` char(10),
`c_first_name` char(20),
`c_last_name` char(30),
`c_preferred_cust_flag` char(1),
`c_birth_day` int,
`c_birth_month` int,
`c_birth_year` int,
`c_birth_country` varchar(20),
`c_login` char(13),
`c_email_address` char(50),
`c_last_review_date_sk` int)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
WITH SERDEPROPERTIES (
"separatorChar" = "|"
)
LOCATION 'hdfs://locationToADirectory';
where:
- location is a directory where the tpcds customer data can be found
Et voila:
Step 2 - Load the data into the target table with data type
FROM `customer_stg`
INSERT OVERWRITE TABLE `customer`
SELECT *;