Cassandra - Partition

About

A partition in Cassandra is a unit of storage that does not get divided across nodes.

A partition is an ordered dictionary (ordered by clustering key).

A partition is a file and should be kept small (less than 100MB). You can then move them easily in the cluster.

To keep the partition small, you can create bucket (ie a new bucket id every n days for instance).

The partition key determine:

how much data will be stored in each partition
how the data is organized on disk

It affect then how quickly Cassandra processes read queries.

Key

The partition key is a key column that is defined from one or more columns in the first element definition of the primary key

Range

A partition range is the range in which the node (called replica) owns the data.

SQL

Example of partition created by channel_id, bucket:

CREATE TABLE messages (
   channel_id bigint,
   bucket int,
   message_id bigint,
   author_id bigint,
   content text,
   PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);