Partition is what enable parallelism.
- Do not under partition - Partitioning on columns with only a few values can cause few partitions and therefore few parallel process. For example, partitioning on gender only creates two partitions to be created (male and female), thus only reduce the latency by a maximum of half.
- Do not over partition - On the other extreme, creating a partition on a column with a unique value (for example, userid) causes multiple partitions. Over partition causes much stress as it has to handle the large number of partitions (implemented as directory, buffer, block)