Pig - Relational Operators

Pig Operations Flow

Documentation

See: Relational Operators

LOAD

Load :

  • Input is assumed to be a bag
  • Assumes that every dataset is a sequence of tuples.

Syntax

A = LOAD 'myFile.txt' USING PigStorage('\t') AS (f1,f2,f3);

where:

  • A is a relation
  • USING is the load function. The default load function is PigStorage.
  • AS specify a schema
  • the field as named f1,f2,f3

Data type: The loader produces the data of the type specified by the schema. If the data does not conform to the schema, depending on the loader, either a null value or an error is generated.

Example

myFile.txt:

1 2 3
4 2 1
8 3 4 

Without Schema

A = LOAD 'myFile.txt'
// is equivalent to
A = LOAD 'myFile.txt' USING PigStorage('\t') AS (f1,f2,f3);
DUMP A;
(1, 2, 3)
(4, 2, 1)
(8, 3, 4) 

Without schema, the fields are not named and all fields default to type bytearray.

With Schema

A = LOAD 'myfile.txt' AS (f1:int, f2:int, f3:int);
A = LOAD 'myfile.txt' USING PigStorage(‘\t’) AS (f1:int, f2:int, f3:int);
DESCRIBE A;
a: {f1: int,f2: int,f3: int}







Share this page:
Follow us:
Task Runner