Table implementation in Hive.
Hive also supports custom serializer/deserializers (SerDe) for complex or irregularly structured data.
The fully qualified name in Hive for a table is:
db_name.table_name
where:
By default, tables are assumed to be of:
The type is based on the location of data.
Use external tables when:
Use internal tables when:
CREATE TABLE apachelog (
host STRING,
identity STRING,
user STRING,
time STRING,
request STRING,
status STRING,
size STRING,
referer STRING,
agent STRING)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
"input.regex" = "([^]*) ([^]*) ([^]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?"
)
STORED AS TEXTFILE;
row format specifies how the rows are stored in the hive table.
You can use:
Stored as design the file format
show tables;
-- The pattern follows Java regular expression syntax (so the period is a wildcard).
SHOW TABLES 'Pattern';
-- Example
SHOW TABLES 'page.*';
SHOW CREATE TABLE tableName;
DESCRIBE FORMATTED tableName
-- To list columns and column types of table.
DESCRIBE EXTENDED page_view;
# col_name data_type comment
clientid string
querytime string
market string
deviceplatform string
devicemake string
devicemodel string
state string
country string
querydwelltime double
sessionid bigint
sessionpagevieworder bigint
# Detailed Table Information
Database: default
Owner: root
CreateTime: Thu Jan 25 13:59:12 UTC 2018
LastAccessTime: UNKNOWN
Protect Mode: None
Retention: 0
Location: wasb://hi-clus-hadoop-01-2018-01-25t11-58-44-894z@hiinformaticasawe.blob.core.windows.net/hive/warehouse/hivesampletable
Table Type: MANAGED_TABLE
Table Parameters:
numFiles 1
numRows 0
rawDataSize 0
totalSize 4955715
transient_lastDdlTime 1516888756
# Storage Information
SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
InputFormat: org.apache.hadoop.mapred.TextInputFormat
OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
Compressed: No
Num Buckets: -1
Bucket Columns: []
Sort Columns: []
Storage Desc Params:
field.delim \t
serialization.format \t
Time taken: 0.449 seconds, Fetched: 41 row(s)
where:
DROP TABLE IF EXISTS tableName;
Each Table can have one or more partition.
with beeline
LOAD DATA INPATH '/user/gerardn/myTable.csv' into TABLE myTable;
Table default.myTable stats: [numFiles=1, numRows=0, totalSize=266712, rawDataSize=0]
OK
Time taken: 3.502 seconds
where:
CREATE EXTERNAL TABLE IF NOT EXISTS myTable
(
col1 string,
col2 int,
col3 double
)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n'
TBLPROPERTIES("skip.header.line.count"="1");
All the data of a table is stored in a directory in HDFS
A partitioned table will create sub-directory by partition columns. See Partitioned column
Table and partition statistics are stored in the Hive Metastore
See Hive - Table-Level Statistics (Table/Partition/Column)
Not yet https://issues.apache.org/jira/browse/HIVE-1558