Data Partition - Row or Column Data Store

Data System Architecture

The problem

Most data warehouses either use a star schema (cube), the central fact table in such a schema usually has many attributes. 50 attributes is very common and 200 is not unusual. So assume a fact table with 60 attributes.

However, the query from a business analyst invariably only accesses 3 or 4 of these 60 attributes.

Physical

A row store will read all 60 attributes because the attributes needed are intermixed with all of the others.

In contrast, a column store will read only the relevant attributes. This will result in a factor of 15-20 less I/O activity.

In addition, a column store has other benefits. Compression is more effective in a column store than a row store, because any storage block only contains one kind of thing. In contrast a storage block contains many different kinds of objects. Obviously, it is easier to compress one thing than many things. A column compression is typically a factor of 3 better than a row compression. This further lowers I/O activity in a column store.

In a columnar data stores, the physical storage components vertically partition the entire model, there is no physical concept of “table” just a logical one where a table structure is made up of columns.

Reference





Discover More
Data System Architecture
Column Family Store

s are NoSql store that clusters the data by a set of key columns. The data is then partitioned / distributed across multiple machines according to the key columns. Storage is sparse since only columns...
Data System Architecture
Data Partition - Vertical partitioning

Create two tables with a column primary key Create a view joining this two tables through this key Selecting only columns of one tables must select only one table (pruning the second one). A solution...
Data System Architecture
Data Partitions (Clustering of data)

A partition cut out the storage in several part according to a predicate. You can have two types of partition : horizontal (sharding) (related to a cutting by row) vertical (related to a cutting...
Data System Architecture
Data Warehouse

A data warehouse is a large central data repository of current, history and summarised data coming from operational and external sources used primarily for analysis. s is large historical databases for...
Data System Architecture
Data Warehouse - Fast Table Scan

Ways to Scan a Fact table Faster Employ Parallelism If you have the resources, then let divide the table into chunks and use all your system resources to scan it. Unfriendly in an OLTP environment,...
Data System Architecture
SQL - Query Performance

The query performance is the of a query. OLTP query are query that comes from an OLTP application. This kind of query retrieve few rows and their performance is generally improved with a index....



Share this page:
Follow us:
Task Runner