Dimensional Data Operation - Data Densification (sparse to dense data) - Preservation of the dimensions

Data System Architecture

About

You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them.

The densification is also :

  • known as the preservation of the dimensions
  • describe as show me the null values

For example, if a product did not sell during a particular time period, you may still want to see the product for that time period with zero sales or null value next to it.

Moreover, time series calculations can be performed most easily when data is dense along the time dimension.

This is because dense data will fill a consistent number of rows for each period, which in turn makes it simple to use the analytic windowing functions with physical offsets.

Data densification is the process of converting sparse data into dense form.

Sparse data to densify

A typical situation in the sh schema with a sparse dimension is shown in the following example, which computes the weekly sales and year-to-date sales for the product Bounce for weeks 20-30 in 2000 and 2001:

SELECT SUBSTR(p.Prod_Name,1,15) Product_Name, 
       t.Calendar_Year          Year,
       t.Calendar_Week_Number   Week, 
       SUM(Amount_Sold)         Sales
FROM   
       Sales s, 
       Times t, 
       Products p
WHERE  
       s.Time_id = t.Time_id AND 
       s.Prod_id = p.Prod_id AND
       p.Prod_name IN ('Bounce') AND 
       t.Calendar_Year IN (2000,2001) AND
       t.Calendar_Week_Number BETWEEN 20 AND 30
GROUP BY 
       p.Prod_Name, 
       t.Calendar_Year, 
       t.Calendar_Week_Number
order by 
       product_name, 
       year, 
       week asc
PRODUCT_NAME          YEAR       WEEK      SALES
--------------- ---------- ---------- ----------
Bounce                2000         20        801
Bounce                2000         21    4062.24
Bounce                2000         22    2043.16
Bounce                2000         23    2731.14
Bounce                2000         24    4419.36
---- Miss one row for the week 25
---- Miss one row for the week 26
Bounce                2000         27    2297.29
Bounce                2000         28    1443.13
Bounce                2000         29    1927.38
Bounce                2000         30    1927.38
Bounce                2001         20     1483.3
Bounce                2001         21    4184.49
Bounce                2001         22    2609.19
Bounce                2001         23    1416.95
Bounce                2001         24    3149.62
Bounce                2001         25    2645.98
---- Miss one row for the week 26
Bounce                2001         27    2125.12
---- Miss one row for the week 28
Bounce                2001         29    2467.92
Bounce                2001         30    2620.17

In this example, we would expect 22 rows of data (11 weeks each from 2 years) if the data were dense. However we get only 18 rows because weeks 25 and 26 are missing in 2000, and weeks 26 and 28 in 2001.

Densification : A right outer join : the first step

To overcome the problem of the sparsity, the first idea is to perform an outer join with the time table to fill the gap

SELECT Product_Name, t.Year, t.Week, NVL(Sales,0) dense_sales
FROM
 (SELECT SUBSTR(p.Prod_Name,1,15) Product_Name,
  t.Calendar_Year Year, t.Calendar_Week_Number Week, SUM(Amount_Sold) Sales
  FROM Sales s, Times t, Products p
  WHERE s.Time_id = t.Time_id AND s.Prod_id = p.Prod_id AND
   p.Prod_name IN ('Bounce') AND t.Calendar_Year IN (2000,2001) AND
   t.Calendar_Week_Number BETWEEN 20 AND 30
  GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number) v
RIGHT OUTER JOIN
 (SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year
  FROM Times
  WHERE Calendar_Year IN (2000, 2001)
  AND Calendar_Week_Number BETWEEN 20 AND 30) t
ON (v.week = t.week AND v.Year = t.Year)
ORDER BY t.year, t.week;

Unfortunately, the product name is missing and we have to go further to find a solution.

Result :

PRODUCT_NAME          YEAR       WEEK DENSE_SALES
--------------- ---------- ---------- -----------
Bounce                2000         20        1602
Bounce                2000         21     8124.48
Bounce                2000         22     4086.32
Bounce                2000         23     5462.28
Bounce                2000         24     8838.72
                      2000         25           0
                      2000         26           0
Bounce                2000         27     4594.58
Bounce                2000         28     2886.26
Bounce                2000         29     3854.76
Bounce                2000         30     3854.76
Bounce                2001         20      2966.6
Bounce                2001         21     8368.98
Bounce                2001         22     5218.38
Bounce                2001         23      2833.9
Bounce                2001         24     6299.24
Bounce                2001         25     5291.96
                      2001         26           0
Bounce                2001         27     4250.24
                      2001         28           0
Bounce                2001         29     4935.84
Bounce                2001         30     5240.34

Densification with partitioned outer join

To overcome the problem of sparsity and to fill the product column, Oracle recommend to use a partitioned outer join.

During a partition outer join, the database :

  • partitions the rows in your query based on the expression you specify in the PARTITION BY clause (in our example by product name)
  • then apply the outer join on each logical partition defined in the query.
  • and retrieve the result by performing an UNION of the outer joins of each of the partitions in the logically partitioned table on the other side of the join.
SELECT Product_Name, t.Year, t.Week, NVL(Sales,0) dense_sales
FROM
 (SELECT SUBSTR(p.Prod_Name,1,15) Product_Name,
  t.Calendar_Year Year, t.Calendar_Week_Number Week, SUM(Amount_Sold) Sales
  FROM Sales s, Times t, Products p
  WHERE s.Time_id = t.Time_id AND s.Prod_id = p.Prod_id AND
   p.Prod_name IN ('Bounce') AND t.Calendar_Year IN (2000,2001) AND
   t.Calendar_Week_Number BETWEEN 20 AND 30
  GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number) v
PARTITION BY (v.Product_Name)
RIGHT OUTER JOIN
 (SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year
  FROM Times
  WHERE Calendar_Year IN (2000, 2001)
  AND Calendar_Week_Number BETWEEN 20 AND 30) t
ON (v.week = t.week AND v.Year = t.Year)
ORDER BY t.year, t.week;

The result :

PRODUCT_NAME          YEAR       WEEK DENSE_SALES
--------------- ---------- ---------- -----------
Bounce                2000         20        1602
Bounce                2000         21     8124.48
Bounce                2000         22     4086.32
Bounce                2000         23     5462.28
Bounce                2000         24     8838.72
Bounce                2000         25           0
Bounce                2000         26           0
Bounce                2000         27     4594.58
Bounce                2000         28     2886.26
Bounce                2000         29     3854.76
Bounce                2000         30     3854.76
Bounce                2001         20      2966.6
Bounce                2001         21     8368.98
Bounce                2001         22     5218.38
Bounce                2001         23      2833.9
Bounce                2001         24     6299.24
Bounce                2001         25     5291.96
Bounce                2001         26           0
Bounce                2001         27     4250.24
Bounce                2001         28           0
Bounce                2001         29     4935.84
Bounce                2001         30     5240.34

Nevertheless, this clause can not be always available and we can see a partition outer join as a outer join between the sparse data and a cross join data set of the dimension. See the following paragraph for an example.

Densification with cross join and outer join

A cross join between the dimensions is always dense. The idea is therefore to perform it first and then outer join it with the sparse data.

First, the cross join :

SELECT product_name, week, year from
 (SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year 
   FROM Times 
  WHERE Calendar_Year IN (2000, 2001)  AND 
        Calendar_Week_Number BETWEEN 20 AND 30) t,
 (SELECT SUBSTR(p.Prod_Name,1,15) Product_Name 
   FROM Products p 
  WHERE p.Prod_name IN ('Bounce'))

The result :

PRODUCT_NAME          WEEK       YEAR
--------------- ---------- ----------
Bounce                  29       2000
Bounce                  21       2001
Bounce                  26       2001
Bounce                  29       2001
Bounce                  30       2001
Bounce                  20       2001
Bounce                  24       2001
Bounce                  23       2000
Bounce                  27       2000
Bounce                  30       2000
Bounce                  28       2001
Bounce                  25       2000
Bounce                  26       2000
Bounce                  22       2001
Bounce                  25       2001
Bounce                  27       2001
Bounce                  21       2000
Bounce                  22       2000
Bounce                  23       2001
Bounce                  20       2000
Bounce                  24       2000
Bounce                  28       2000

And then an outer join to retrieve the amount sold.

SELECT cross_join.Product_Name, cross_join.Year, cross_join.Week, NVL(data_to_densify.Sales,0) dense_sales
FROM
 (SELECT SUBSTR(p.Prod_Name,1,15) Product_Name,
  t.Calendar_Year Year, t.Calendar_Week_Number Week, SUM(Amount_Sold) Sales
  FROM Sales s, Times t, Products p
  WHERE s.Time_id = t.Time_id AND s.Prod_id = p.Prod_id AND
   p.Prod_name IN ('Bounce') AND t.Calendar_Year IN (2000,2001) AND
   t.Calendar_Week_Number BETWEEN 20 AND 30
  GROUP BY p.Prod_Name, t.Calendar_Year, t.Calendar_Week_Number) data_to_densify
RIGHT OUTER JOIN
(
  SELECT week, year, product_name from
 (SELECT DISTINCT Calendar_Week_Number Week, Calendar_Year Year 
   FROM Times 
  WHERE Calendar_Year IN (2000, 2001)  AND 
        Calendar_Week_Number BETWEEN 20 AND 30) t,
 (SELECT SUBSTR(p.Prod_Name,1,15) Product_Name 
   FROM Products p 
  WHERE p.Prod_name IN ('Bounce') ) p
) cross_join
ON (
        data_to_densify.product_Name = cross_join.Product_Name AND 
        data_to_densify.week = cross_join.week AND 
        data_to_densify.Year = cross_join.Year
   )
ORDER BY year, week

In our example above, a right outer join is used but of course all outer joins as left and full can be used instead.

The result :

PRODUCT_NAME          YEAR       WEEK DENSE_SALES
--------------- ---------- ---------- -----------
Bounce                2000         20        1602
Bounce                2000         21     8124.48
Bounce                2000         22     4086.32
Bounce                2000         23     5462.28
Bounce                2000         24     8838.72
Bounce                2000         25           0
Bounce                2000         26           0
Bounce                2000         27     4594.58
Bounce                2000         28     2886.26
Bounce                2000         29     3854.76
Bounce                2000         30     3854.76
Bounce                2001         20      2966.6
Bounce                2001         21     8368.98
Bounce                2001         22     5218.38
Bounce                2001         23      2833.9
Bounce                2001         24     6299.24
Bounce                2001         25     5291.96
Bounce                2001         26           0
Bounce                2001         27     4250.24
Bounce                2001         28           0
Bounce                2001         29     4935.84
Bounce                2001         30     5240.34

Reference





Recommended Pages
Obiee Logical Sql Densification
Blog - Obiee 10G - Densification with the OBIEE logical Sql (Sparse to dense data)

You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them. And it’s also the most problem that you have when you...
Star Schema
Dimensional Data Modeling - Fact Table

A fact table is a table that stores fact where a fact can be seen as the representation of an event. It's also known as: Transactional history Unalterable fact. (The content of this kind of table...
Data System Architecture
Dimensional Data Modeling - What means Data is dense/sparse ?

Sparse and dense are a storage property of the values of an attribute. Data is normally stored in sparse form. If no event has happened, there is no data stored. In dimensional modeling, if...
Obiee Dense Analytics By Members
OBIEE - Densification / Dimensions Preservation Possibilities

You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them. A MOLAP environment give you automatically this feature...
Obiee Logical Sql Densification
OBIEE - Densification with logical Sql

You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them. And it's also the most problem that you have when you work...
Obiee Fact Cross Join
OBIEE - Densification with the fact-based fragmentation capabilities

You may want to view the data in dense form, with rows for all combination of dimension values displayed even when no fact data exist for them. The preservation of the dimensions is is also well known...
Card Puncher Data Processing
Oracle Database - SQL - First Value Analytic function

FIRST_VALUE is an non-deterministic analytic function. It returns the first value in an ordered set of values. If the first value in the set is null, then the function returns NULL unless you specify...



Share this page:
Follow us:
Task Runner