Table of Contents

About

ORC Files or Optimized RC Files were invented to optimize performance in Hive and are primarily backed by HortonWorks.

Data in ORC files is fast to load because data stripes can be read in parallel. The rows in each data stripe are loaded sequentially. To optimize load time, use a data stripe size of approximately 256 MB or less.

Example Hive

CREATE TABLE lineitem_orc_part
    (L_ORDERKEY INT, L_PARTKEY INT,L_SUPPKEY INT, L_LINENUMBER INT,
     L_QUANTITY DOUBLE, L_EXTENDEDPRICE DOUBLE, L_DISCOUNT DOUBLE,
     L_TAX DOUBLE, L_RETURNFLAG STRING, L_LINESTATUS STRING,
     L_SHIPDATE_PS STRING, L_COMMITDATE STRING, L_RECEIPTDATE STRING,
     L_SHIPINSTRUCT STRING, L_SHIPMODE STRING, L_COMMENT      STRING)
PARTITIONED BY(L_SHIPDATE STRING)
STORED AS ORC;

Documentation