Table of Contents

About

Relation - Column in Hive Context

Statistic

Column Statistics in Hive (HIVE-1362)

See Hive - Table-Level Statistics (Table/Partition/Column)

Built-in

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

Hive 0.8.0 provides support for two virtual columns:

  • INPUT__FILE__NAME is the input file's name for a mapper task.
select 
  INPUT__FILE__NAME, 
  key, 
  BLOCK__OFFSET__INSIDE__FILE 
from 
  src;
 
select 
  key, 
  count(INPUT__FILE__NAME) 
from 
  src 
group by key 
order by key;
  • BLOCK__OFFSET__INSIDE__FILE is the current global file position. For block compressed file, it is the current block's file offset, which is the current block's first byte's file offset.
select
  * 
from 
  src 
where
  BLOCK__OFFSET__INSIDE__FILE > 12000
order by key;