TPC - DS

About

TPC-DS was designed to be representative of a traditional report-based workload.

TPC-DS models the decision support functions of a retail product supplier. TPC-DS does not benchmark the operational systems.

The benchmark models the two most important components of any mature decision support system:

  • User queries, which convert operational facts into business intelligence.
  • Data maintenance, which synchronizes the process of management analysis with the operational external data source on which it relies.

The workload imitates the activity of a multi-channel retailer; thus tracking store, web and catalog sales channels.

The TPC-DS benchmark, models the challenges of business intelligence systems where operational data is used both:

  • to support the making of business decisions in near real time
  • and to direct long-range planning and exploration.

The Benchmark is defined as the execution of the Load Test followed by the Performance Test.

Characteristic

  • Examine large volumes of data;
  • Execute queries of various operational requirements and complexities (e.g., ad-hoc, reporting, iterative, OLAP, data mining);
  • Are characterized by high CPU and IO load;
  • Are periodically synchronized with source OLTP databases through database maintenance functions.
  • multi-user
  • Run on “Big Data” solutions, such as RDBMS as well as Hadoop/Spark based systems.

Prerequisites

  • queries must be executed via SQL interface

Business

TPC-DS imitates the activity of a business model that:

  • manage, sell and distribute products (e.g., food, electronics, furniture, music and toys etc.).
  • have multiple stores located nation-wide.
  • is a multi sales channel retailer selling goods through:
    • store,
    • web
    • and catalog sales channels.

All three channel sub-systems are autonomous and retain possibly redundant information regarding customers, addresses, etc.

Tpc Ds Data Flow

System:

  • sales and returns,
  • inventory system
  • promotion system.

The following are examples of business processes of this retail company:

  • Record customer purchases (and track customer returns) from any sales channel
  • Modify prices according to promotions
  • Maintain warehouse inventory
  • Create dynamic web pages
  • Maintain customer profiles (Customer Relationship Management)

Model

TPC-DS’ modeling of the business environment falls into three broad categories:

  • Data Model and Data Access Assumptions
  • Query and User Model Assumptions
  • Data Maintenance Assumptions

Data Maintenance (Upsert)

see TPC-DS - Data Refresh (Data Maintenance or DM)

Schema

TPC-DS - Schema

Configuration

  • Locking level: To achieve the optimal compromise between performance and operational consistency, the system administrator can set, once and for all, the locking levels and the concurrent scheduling rules for queries and data maintenance functions.

Paths and artifacts

Content File Name Location Usage
Data generator dsdgen Used to generate the data sets for the benchmark
Query generator dsqgen Used to generate the query sets for the benchmark
Query Templates query_templates/ Used by dsqgen to generate executable query text
Query Template Variants query_variants/ Used by dsqgen to generate alternative executable query text
Table definitions in ANSI SQL tpcds.sql, tpcds_source.sql Sample implementation of the logical schema for the data warehouse.
Data Maintenance Functions in ANSI SQL data_maintenance/ Sample implementation of the SQL needed for the Data Maintenance phase of the benchmark
Answer Sets answer_sets/ Used to verify the initial population of the data warehouse.
Reference Data Set run dsdgen with – validate flag Set of files for each scale factor to compare the correct data generation of base data.

Blog

Code

Documentation / Reference

Task Runner