Data Partition - Horizontal

Data System Architecture

About

Horizontal partitions cuts the data by row.

Vertical partition cuts the data by column

Considerations

Partition is what enable parallelism.

  • Do not under partition - Partitioning on columns with only a few values can cause few partitions and therefore few parallel process. For example, partitioning on gender only creates two partitions to be created (male and female), thus only reduce the latency by a maximum of half.
  • Do not over partition - On the other extreme, creating a partition on a column with a unique value (for example, userid) causes multiple partitions. Over partition causes much stress as it has to handle the large number of partitions (implemented as directory, buffer, block)
  • Avoid data skew - Choose your partitioning key wisely or use a hash functions so that all partitions are even size. Otherwise, the parallel thread with the most data will determine the total latency.

Type





Discover More
Data System Architecture
Data Partition - Sharding (Horizontal Partitioning on many server)

Sharding is a synonym for horizontal partitioning on many server. Each partition is located on a server. See also
Data System Architecture
Data Partitions (Clustering of data)

A partition cut out the storage in several part according to a predicate. You can have two types of partition : horizontal (sharding) (related to a cutting by row) vertical (related to a cutting...



Share this page:
Follow us:
Task Runner