Pig - Relational Operators
Documentation
See: Relational Operators
LOAD
Load :
- Input is assumed to be a bag
- Assumes that every dataset is a sequence of tuples.
Syntax
A = LOAD 'myFile.txt' USING PigStorage('\t') AS (f1,f2,f3);
where:
- A is a relation
- USING is the load function. The default load function is PigStorage.
- AS specify a schema
- the field as named f1,f2,f3
Data type: The loader produces the data of the type specified by the schema. If the data does not conform to the schema, depending on the loader, either a null value or an error is generated.
Example
myFile.txt:
1 2 3
4 2 1
8 3 4
Without Schema
A = LOAD 'myFile.txt'
// is equivalent to
A = LOAD 'myFile.txt' USING PigStorage('\t') AS (f1,f2,f3);
DUMP A;
(1, 2, 3)
(4, 2, 1)
(8, 3, 4)
Without schema, the fields are not named and all fields default to type bytearray.
With Schema
A = LOAD 'myfile.txt' AS (f1:int, f2:int, f3:int);
A = LOAD 'myfile.txt' USING PigStorage(‘\t’) AS (f1:int, f2:int, f3:int);
DESCRIBE A;
a: {f1: int,f2: int,f3: int}