Cassandra is a NoSql database for transactional workloads that require high scale and maximum availability.
Cassandra is suited for transactional workloads at high volume and shouldn’t be considered as a data warehouse
You don’t start with the data model; you start with the query model.
It is designed to work on a denormalized data model.
Data is arranged as one query per table, and data is repeated amongst many tables, a process known as denormalization.
A table is designed to satisfy a query that should support a process (user registration, user login, …)
Joins are not supported,
Performing joins on the client should be a very rare case; you really want to duplicate (denormalize) the data instead.
There is no concept of foreign keys or relational integrity.
In contrario, a relational database’s approach to data modeling is table-centric.
Materialized views are available (since 3.0) allows to create multiple denormalized views of data and keeps it in sync with the table
A key goal is to minimize the number of partitions that must be scanned in order to satisfy a query. Because the partition is a unit of storage that does not get divided across nodes, a query that searches a single partition will typically yield the best performance
The sort order available on queries is fixed, and is determined entirely by the selection of clustering columns you supply in the CREATE TABLE command.
The CQL SELECT statement does support ORDER BY semantics, but only in the order specified by the clustering columns (ascending or descending).
The user interface design for the application is often a great artifact to use to begin identifying queries
You create a logical model containing a table for each query, capturing entities and relationships from the conceptual model
To name each table, identify the primary entity type for which you are querying, and use that to start the entity name. If you are querying by attributes of other related entities, you append those to the table name, separated with _by_; for example, hotels_by_poi.
Next, identify the primary key for the table with:
If any of the other additional attributes are the same for every instance of the partition key, mark the column as static. For instance, if a name is in the partition key, you could had a description attribute as static column.
Artem Chebotko Diagram helps visualize the relationships between queries and tables:
If you’re searching by attribute, this attribute should be a part of the primary key.
It's not a KV store (key value) but a KKV store 2)
Example:
CREATE TABLE messages (
channel_id bigint,
bucket int,
message_id bigint,
author_id bigint,
content text,
PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);
where:
PRIMARY KEY (
(partitionCol1, ..., partitionColN),
clusteringCol1, ..., clusteringColN
)
Cassandra has support for modelling time series data wherein each row can have dynamic number of columns.
When CLUSTERING ORDER BY is used in time series data models, As an example:
PRIMARY KEY (userid, added_date, videoid)
SELECT * FROM user_videos WHERE userid = 522b1fe2-2e36-4cef-a667-cd4237d08b89 LIMIT 10;
docker run ^
--name cassandra ^
-d ^
cassandra:3.11.5
docker exec -it cassandra bash
cqlsh localhost
SELECT cluster_name, listen_address FROM system.local;
cluster_name | listen_address
--------------+----------------
Test Cluster | 172.17.0.3
create keyspace mySchema with replication = {'class':'SimpleStrategy','replication_factor':1};
use mySchema;
CREATE TABLE t (
pk int,
t int,
v text,
s text static,
PRIMARY KEY (pk, t)
);
INSERT INTO t (pk, t, v, s) VALUES (0, 0, 'val0', 'static0');
INSERT INTO t (pk, t, v, s) VALUES (0, 1, 'val1', 'static1');
SELECT * FROM t;
pk | t | s | v
----+---+---------+------
0 | 0 | static1 | val0
0 | 1 | static1 | val1