Data Partition - Row or Column Data Store

1 - The problem

Most data warehouses either use a star schema (cube), the central fact table in such a schema usually has many attributes. 50 attributes is very common and 200 is not unusual. So assume a fact table with 60 attributes.

However, the query from a business analyst invariably only accesses 3 or 4 of these 60 attributes.

2 - Physical

A row store will read all 60 attributes because the attributes needed are intermixed with all of the others.

In contrast, a column store will read only the relevant attributes. This will result in a factor of 15-20 less I/O activity.

In addition, a column store has other benefits. Compression is more effective in a column store than a row store, because any storage block only contains one kind of thing. In contrast a storage block contains many different kinds of objects. Obviously, it is easier to compress one thing than many things. A column compression is typically a factor of 3 better than a row compression. This further lowers I/O activity in a column store.

In a columnar data stores, the physical storage components vertically partition the entire model, there is no physical concept of “table” just a logical one where a table structure is made up of columns.

4 - Reference

Data Science
Data Analysis
Data Science
Linear Algebra Mathematics

Powered by ComboStrap