TPC-DS - Schema

Tpc Ds Data Flow

About

The TPC-DS schema is a snowflake schema. It consists of multiple dimension and fact tables. Each dimension has a single column surrogate key. The fact tables join with dimensions using each dimension table's surrogate key.

Table List

Fact

The schema includes seven fact tables:

  • A pair of fact tables focused on the product sales and returns for each of the three channels. ie:
    • Store Sales and returns
    • Catalog Sales and returns
    • Web Sales and returns
  • A single fact table that models inventory for the catalog and internet sales channels.
    • Inventory
Type Name Abbreviation
Fact Store Sales SS
Fact Store Returns SR
Fact Catalog Sales CS
Fact Catalog Returns CR
Fact Web Sales WS
Fact Web Returns WR
Fact Inventory INV

Dimension

In addition, the schema includes 17 dimension tables that are associated with all sales channels. The following clauses specify the logical design of each table:

  • The name of the table, along with its abbreviation (listed parenthetically)
  • A logical diagram of each fact table and its related dimension tables
  • The high-level definitions for each table and its relationship to other tables
  • The scaling and cardinality information for each column
Type Name Abbreviation
Dimension Store S
Dimension Call Center CC
Dimension Catalog Page CP
Dimension Web Site WEB
Dimension Web Page WP
Dimension Warehouse W
Dimension Customer C
Dimension Customer Address CA
Dimension Customer Demographics CD
Dimension Date Dim D
Dimension Household Demographics D
Dimension Item I
Dimension Income Band IB
Dimension Promotion P
Dimension Reason R
Dimension Ship Mode SM
Dimension Time Dim T
Dimension dsdgen_version NA

Snowflakes

Store

Tpcds Sales Er Diagram

Tpcds Store Er Diagram

Catalog

Tpcds Catalog Sales Er Diagram

Tpcds Catalog Return Er Diagram

Web

Tpcds Web Sales Er Diagram

Tpcds Web Return Er Diagram

Documentation / Reference





Discover More
Card Puncher Data Processing
Hive - CSV

CSV / TSV format in Hive. You can create a external table with: the Open Csv Serde or with the default TEXTFILE. See Example with the customer table of the TPCDS schema STORED AS TEXTFILE...
Tpcds Customer Data
Hive - Open Csv Serde

The Csv Serde is a serde that is applied above a text file. It's one way of reading a CSV / TSV format. The CSVSerde is available in Hive 0.14 and greater. Origin: ....
Card Puncher Data Processing
Hive - Text File (TEXTFILE)

TEXTFILE is the default storage format of a table STORED AS TEXTFILE is normally the storage format and is then optional. The delimiters are assumed to be ^A(ctrl-a). Example with the customer...
Tpc Ds Data Flow
TPC - DS

TPC-DS was designed to be representative of a traditional report-based workload. TPC-DS models the decision support functions of a retail product supplier. TPC-DS does not benchmark...
Tpc Ds Data Maintenance
TPC-DS - Data Refresh (Data Maintenance or DM)

A Data Maintenance Test consists of the execution of a series of refresh streams. This process tracks, possibly with some delay, the state of an operational database through data maintenance functions,...
Tpc Ds Data Flow
TPC-DS - Query

A query is an ordered set of one or more valid SQL statements resulting from applying the required parameter substitutions to a given query template. The order of the SQL statements is defined in the...



Share this page:
Follow us:
Task Runner