Hive - Dynamic Partition (DP)

1 - About

DynamicPartitions (DP) is a partition type where the partitioning columns are only known at EXECUTION TIME (ie within a SQL statement)

Hive will create the partitions automatically.

The only difference is that DP columns do not have values, while SP columns do

3 - Conf

3.1 - Enable


SET hive.exec.dynamic.partition = true -- Needs to be set to true to enable dynamic partition inserts

3.2 - Mode

  • set hive.exec.dynamic.partition.mode=strict - the user must specify at least one static partition in case the user accidentally overwrites all partitions
  • set hive.exec.dynamic.partition.mode=nonstrict - in nonstrict mode all partitions are allowed to be dynamic - you don't need any static partition column.

4 - Example

4.1 - Full Dynamic


INSERT OVERWRITE TABLE T PARTITION (ds, hr)
SELECT key, value, ds, hr FROM srcpart WHERE ds is not null and hr>10;

where:

  • the dynamic partition columns (ie ds, hr) must be specified:
    • last (among the columns in the SELECT statement)
    • in the same order (than in the PARTITION clause)

4.2 - Static and Dynamic

Replace the partition 2010-03-03 (as the partition column order determines directory hierarchy, you cannot set the static value on the second column partition)


INSERT OVERWRITE TABLE T PARTITION (ds='2010-03-03', hr)
SELECT key, value, /*ds,*/ hr FROM srcpart WHERE ds is not null and hr>10;

4.3 - Create table as (CTAS)


CREATE TABLE T (key int, value string) PARTITIONED BY (ds string, hr int) AS
SELECT key, value, ds, hr+1 hr1 FROM srcpart WHERE ds is not null and hr>10;

5 - Documentation / Reference


Data Science
Data Analysis
Statistics
Data Science
Linear Algebra Mathematics
Trigonometry

Powered by ComboStrap