Table of Contents

Hive - Sample Clause

About

The sampling clause allows the users to write queries for samples of the data instead of the whole table. Currently the sampling is done on the clustered column. (ie columns specified in the CLUSTERED BY)

Syntax

The buckets are numbered starting from 0.

In general the TABLESAMPLE syntax looks like:

TABLESAMPLE(BUCKET x OUT OF y)

where:

<MATH> \text{BucketNumber } \text{module } y = x </MATH>

Example

The table pv_gender_sum has 32 bucket.

SELECT pv_gender_sum.* FROM pv_gender_sum TABLESAMPLE(BUCKET 3 OUT OF 32);
SELECT pv_gender_sum.* FROM pv_gender_sum TABLESAMPLE(BUCKET 3 OUT OF 16)
SELECT pv_gender_sum.* FROM pv_gender_sum TABLESAMPLE(BUCKET 3 OUT OF 64 ON userid)

Documentation / Reference