Hive - Parquet

Card Puncher Data Processing

About

Table - Parquet Format (On Disk) in Hive

Supported natively in Hive 0.13 and later.

Example

CREATE TABLE parquet_test (
 id int,
 str string,
 mp MAP<STRING,STRING>,
 lst ARRAY<STRING>,
 strct STRUCT<A:STRING,B:STRING>) 
PARTITIONED BY (part string)
STORED AS PARQUET;
  • Wifi data table
CREATE EXTERNAL TABLE `inhome_Wifi`(
  `attrvaluestring` string, 
  `polling_timestamp` timestamp, 
  `modemtype` string, 
  `modemmac` string, 
  `oid` string, 
  `mib_index` int, 
  `assdevindex` int, 
  `attrvalueinteger` int, 
  `attrvaluefloat` float)
PARTITIONED BY ( 
  `partitioning_year` int, 
  `partitioning_month` int, 
  `partitioning_day` int)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  'maprfs:/production/inhome_wifi/inhome_wifi'
TBLPROPERTIES (
  'last_modified_by'='mapr', 
  'last_modified_time'='1512370733', 
  'transient_lastDdlTime'='1512370733')

Documentation / Reference





Discover More
Card Puncher Data Processing
Hive - File Format (Storage format)

The file format is given at the creation of a table with the statement STORED AS. The default is given by the hive.default.fileformat...
Card Puncher Data Processing
Hive - SerDe

Serde in Hive permits to define how a storage format should be processed to produce records. orc Compressed...
Parquet Nested Representation
Table - Parquet Format (On Disk)

Parquet is a read-optimized encoding format (write once, read many) for columnar tabular data Parquet is built from the ground up with complex nested data structures and implements...



Share this page:
Follow us:
Task Runner