About
(Data Type | Data Structure) in Hive
Articles Related
System
Hive supports the following data types category:
- and complex
Primitive
Data Type - (Primitive|Native|Built-in) in Hive
Category | Type | Description |
---|---|---|
Integers | TINYINT | 1 byte integer |
Integers | SMALLINT | 2 byte integer |
Integers | INT | 4 byte integer |
Integers | BIGINT | 8 byte integer |
Boolean | BOOLEAN | TRUE/FALSE |
Floating point numbers | FLOAT | single precision |
Floating point numbers | DOUBLE | double precision |
Fixed point numbers | DECIMAL | a fixed point value of user defined scale and precision |
String | STRING | sequence of characters in a specified character set |
String | VARCHAR | sequence of characters in a specified character set with a maximum length |
String | CHAR | sequence of characters in a specified character set with a defined length |
Date and time | TIMESTAMP | a specific point in time, up to nanosecond precision |
Date and time | DATE | a date |
Binary | BINARY | a sequence of bytes |
Primitive Type > Number > DOUBLE > FLOAT > BIGINT > INT > SMALLINT > TINYINT
> STRING
> BOOLEAN
The hierarchy defines how the types are implicitly converted. Implicit conversion is allowed for types from child to an ancestor.
Note that the type hierarchy allows the implicit conversion of STRING to DOUBLE.
Complex
Data Type - Complex Data Type in Hive
Complex Types can be built up from primitive types and other composite types using:
- Structs: the elements within the type can be accessed using the DOT (.) notation. For example, for a column c of type STRUCT {a INT; b INT}, the a field is accessed by the expression c.a
- Maps (key-value tuples): The elements are accessed using ['element name'] notation. For example in a map M comprising of a mapping from 'group' → gid the gid value can be accessed using M['group']
- Arrays (indexable lists): The elements in the array have to be in the same type. Elements can be accessed using the [n] notation where n is an index (zero-based) into the array. For example, for an array A having the elements ['a', 'b', 'c'], A[1] retruns 'b'.
Using the primitive types and the constructs for creating complex types, types with arbitrary levels of nesting can be created. For example, a type User may comprise of the following fields:
- gender—which is a STRING.
- active—which is a BOOLEAN.
Management
User defined
The typing system is closely tied to the SerDe (Serailization/Deserialization) and object inspector interfaces.
User can create their own types by implementing their own object inspectors, and using these object inspectors they can create their own SerDes to serialize and deserialize their data into HDFS files.
Builtin object inspectors:
- ListObjectInspector,
- StructObjectInspector
- and MapObjectInspector
The dotted notation is used to navigate nested types, for example a.b.c = 1 looks at field c of field b of type a and compares that with 1.