Relational Data Modeling - Aggregate Table (Summary Table)

About

An Aggregate table is a persistence unit of aggregate data.

Aggregate tables are also known as summary table (from summary)

A materialized view can be used to implement aggregate data

A aggregate is a table that contains the result of a query (generally an aggregated one but not always …).

Articles Related

Usage

Query Performance

Data - Cache - Pre-aggregating and storing additive information is the standard practice for improving the query performance of databases.

An aggregate table is for an analytics system what an index is for an OLTP system.

When users request information at a high grain of aggregation, Pre-aggregating and storing additive information is the standard practice for improving the query performance of relational databases.

Query Rewriting

When a user or an application asks for a column in a fact table, a sql engine that is aware of the aggregate table is capable of rewrite the query against/using the aggregate table. See SQL Engine - Query Rewrite

When data becomes bigger

Parallel

To improve the throughput, you can run the process in parallel by adding more

hardware (more cpu, memory, disk)
node (ie computer) - in distributed computing such as Hadoop.

Incremental Processing

Stream - How to capture change of state - Change Data Capture (CDC)

Implementation

You can implement an aggregate table with:

a full load or incremental dataflow that executes or implements an aggregate sql insert into … select dim1, dim2, sum(add1) from fact group by dim1, dim2)
or a materialized view

Documentation / Reference

https://www.slideshare.net/julianhyde/data-profiling-with-apache-calcite