Dimensional Data Modeling - What means Data is dense/sparse ?

Star Schema

About

Sparse and dense are a storage property of the values of an attribute.

Sparse is better than Dense

Definition

Sparse

Data is normally stored in sparse form. If no event has happened, there is no data stored.

In dimensional modeling, if no value exists for a given combination of dimension values, no row exists in the fact table.

For example, if not every product is sold in every market. In this case, Market and Product are sparse dimensions.

It's why in the reporting tool Obiee for instance, by default, data are considered sparse.

Dense

Most multidimensional databases may also contain dense dimensions. A fact table is considered to have dense data if it has (of a high probability to have) one row for every combination of its associated dimension levels.

Example of sparse data

A typical situation in the SH schema with a sparse dimension is shown in the following example, which computes the weekly sales and year-to-date sales for the product Bounce for weeks 20-30 in 2000 and 2001:

SELECT SUBSTR(p.Prod_Name,1,15) Product_Name, 
       t.Calendar_Year          Year,
       t.Calendar_Week_Number   Week, 
       SUM(Amount_Sold)         Sales
FROM   
       Sales s, 
       Times t, 
       Products p
WHERE  
       s.Time_id = t.Time_id AND 
       s.Prod_id = p.Prod_id AND
       p.Prod_name IN ('Bounce') AND 
       t.Calendar_Year IN (2000,2001) AND
       t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY 
       p.Prod_Name, 
       t.Calendar_Year, 
       t.Calendar_Week_Number
order by 
       product_name, 
       year, 
       week asc
PRODUCT_NAME          YEAR       WEEK      SALES
--------------- ---------- ---------- ----------
Bounce                2000         20        801
Bounce                2000         21    4062.24
Bounce                2000         22    2043.16
Bounce                2000         23    2731.14
Bounce                2000         24    4419.36
---- Miss one row for the week 25
---- Miss one row for the week 26
Bounce                2000         27    2297.29
Bounce                2000         28    1443.13
Bounce                2000         29    1927.38
Bounce                2000         30    1927.38
Bounce                2001         20     1483.3
Bounce                2001         21    4184.49
Bounce                2001         22    2609.19
Bounce                2001         23    1416.95
Bounce                2001         24    3149.62
Bounce                2001         25    2645.98
---- Miss one row for the week 26
Bounce                2001         27    2125.12
---- Miss one row for the week 28
Bounce                2001         29    2467.92
Bounce                2001         30    2620.17

In this example, we would expect 22 rows of data (11 weeks each from 2 years) if the data were dense. However we get only 18 rows because weeks 25 and 26 are missing in 2000, and weeks 26 and 28 in 2001.

Data Densification/Completeness (sparse to dense data)

You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them.

For example, if a product did not sell during a particular time period, you may still want to see the product for that time period with zero sales value next to it.

ROLAP

If you are in a relational environment, you must perform a SQL densification: Continue to read …

MOLAP

In a MOLAP environement, sparse data are not present.

In relational tables, records are only present for data that exists, whereas in a MOLAP cubes, an empty value is used where no data exists. The cube stores the real data values and whereas the empty values are easily addressable and queryable, they are not actually stored in the cube.

For example, you can easily refer to last month’s data value in a query even if that month has no data, but the empty value is not stored.

This highlights the very important feature of MOLAP cubes in that they operate as if they are fully populated with both actual data and zero data for all combinations possible of their attribute dimensions values.

This can have a very important benefit because it makes defining calculations easier as it can be assumed that all data points are present in the cube.

For example, the following formula can be defined even if there is no actual value, or physical storage used, for Tents in Feb2002:

nvl(sales('Feb2002','Tents'),0) - nvl(sales('Jan2002', 'Tents'), 0)

If the reporting requirements need to process null value rows, serious consideration should be given to using a MOLAP solution because they can handle this information by default rather than having to write a more complex densification SQL statement to generate the same result.





Discover More
Obiee Logical Sql Densification
Blog - Obiee 10G - Densification with the OBIEE logical Sql (Sparse to dense data)

You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them. And it’s also the most problem that you have when you...
Obiee Period To Period Comparison Lag
Blog - Obiee 10G – Period to period comparison with the analytical function Lag/Lead

During this thread on the Obiee forum, Goran point out a solution for a period to period comparison...
Testing Infrastructure
Code - Testing (Software Quality Assurance|SQA|Validator|Checker)

A test is performed to verify that the system is conformed to the specification and is the most important part of code quality. In a “Test-driven_developmenttest-driven software development (tdd)”...
Data System Architecture
Column Family Store

s are NoSql store that clusters the data by a set of key columns. The data is then partitioned / distributed across multiple machines according to the key columns. Storage is sparse since only columns...
Thomas Bayes
Data Mining - Orthogonal Partitioning Clustering (O-Cluster or OC) algorithm

O-Cluster creates a hierarchical, grid-based clustering model. This Unsupervised algorithm creates clusters that define dense areas in the attribute space. A sensitivity parameter defines the baseline...
Excel Transpose Pivot
Dimensional Data Operation - (Pivot|Transpose|Cross-tab|Matrix)

The pivot is an dimensional data operation where rows and columns are interchanged. Pivot is also known as: Transpose in the linear algebra word Matrix in the Microsoft world Cross-tab This operation...
Data System Architecture
Dimensional Data Operation - Data Densification (sparse to dense data) - Preservation of the dimensions

You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them. The densification is also : known as thepreservation...
Essbase Overview
Essbase - Dense and Sparse Dimension

Most data sets of multidimensional databases have two characteristics: Data is not smoothly and uniformly distributed. Data does not exist for the majority of member combinations. For example, all...
Auto Placed Form
How works CSS grid Auto-placement ?

In a grid, the value auto : will place the Css grid item into the next available empty grid cell (growing the grid if there’s no space left) is driven by the placement algorithm is controlled...
Image Vector
Linear Algebra - Vector

tuple in Linear algebra are called vector. A vector is a list of scalar (real number) used to represent a When the letters are in bold in a formula, it signifies that they're vectors, To represent...



Share this page:
Follow us:
Task Runner