Hive - Column

Card Puncher Data Processing

About

Relation - Column in Hive Context

Statistic

Column Statistics in Hive (HIVE-1362)

See Hive - Table-Level Statistics (Table/Partition/Column)

Built-in

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+VirtualColumns

Hive 0.8.0 provides support for two virtual columns:

  • INPUT__FILE__NAME is the input file's name for a mapper task.
select 
  INPUT__FILE__NAME, 
  key, 
  BLOCK__OFFSET__INSIDE__FILE 
from 
  src;
 
select 
  key, 
  count(INPUT__FILE__NAME) 
from 
  src 
group by key 
order by key;
  • BLOCK__OFFSET__INSIDE__FILE is the current global file position. For block compressed file, it is the current block's file offset, which is the current block's first byte's file offset.
select
  * 
from 
  src 
where
  BLOCK__OFFSET__INSIDE__FILE > 12000
order by key;





Discover More
Hdfs Ui Block Information
HDFS - Block

in HDFS. The block size can be changed by file. Block are stored on a datanode and are grouped in block pool The location on where the blocks are stored is defined in hdfs-site.xml....
Yarn Hortonworks
HDFS - File

A typical file in HDFS is gigabytes to terabytes in size. A file is split into one or more blocks. Files in HDFS are write-once (except for appends and truncates) and have strictly one writer at any...



Share this page:
Follow us:
Task Runner