Hive - Text File (TEXTFILE)

Card Puncher Data Processing

About

TEXTFILE is the default storage format of a table

STORED AS TEXTFILE is normally the storage format and is then optional.

Default

Delimiters

The delimiters are assumed to be ^A(ctrl-a).

Syntax

STORED AS TEXTFILE

Example with the customer table of the TPCDS schema

create external table customer_row
(
    c_customer_sk             bigint,
    c_customer_id             string,
    c_current_cdemo_sk        bigint,
    c_current_hdemo_sk        bigint,
    c_current_addr_sk         bigint,
    c_first_shipto_date_sk    bigint,
    c_first_sales_date_sk     bigint,
    c_salutation              string,
    c_first_name              string,
    c_last_name               string,
    c_preferred_cust_flag     string,
    c_birth_day               int,
    c_birth_month             int,
    c_birth_year              int,
    c_birth_country           string,
    c_login                   string,
    c_email_address           string,
    c_last_review_date        string
)
row format delimited fields terminated by '|' 
STORED AS TEXTFILE
LOCATION 'hdfs://locationToMyDirectory';

where you can use the following clause

  • DELIMITED
  • ESCAPED BY to enable escaping
  • NULL DEFINED AS - A custom NULL format (default is \N)

STORED AS INPUTFORMAT/OUTPUTFORMAT

STORED AS 
INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'





Discover More
Card Puncher Data Processing
Hive - CSV

CSV / TSV format in Hive. You can create a external table with: the Open Csv Serde or with the default TEXTFILE. See Example with the customer table of the TPCDS schema STORED AS TEXTFILE...
Card Puncher Data Processing
Hive - File Format (Storage format)

The file format is given at the creation of a table with the statement STORED AS. The default is given by the hive.default.fileformat...
Tpcds Customer Data
Hive - Open Csv Serde

The Csv Serde is a serde that is applied above a text file. It's one way of reading a CSV / TSV format. The CSVSerde is available in Hive 0.14 and greater. Origin: ....
Card Puncher Data Processing
Hive - Table

Table implementation in Hive. serializer/deserializers (SerDe) The fully qualified name in Hive for a table is: where: db_name is the database name By default, tables are assumed to be of:...



Share this page:
Follow us:
Task Runner