TPC - DS
About
TPC-DS was designed to be representative of a traditional report-based workload.
TPC-DS models the decision support functions of a retail product supplier. TPC-DS does not benchmark the operational systems.
The benchmark models the two most important components of any mature decision support system:
- User queries, which convert operational facts into business intelligence.
- Data maintenance, which synchronizes the process of management analysis with the operational external data source on which it relies.
The workload imitates the activity of a multi-channel retailer; thus tracking store, web and catalog sales channels.
The TPC-DS benchmark, models the challenges of business intelligence systems where operational data is used both:
- to support the making of business decisions in near real time
- and to direct long-range planning and exploration.
The Benchmark is defined as the execution of the Load Test followed by the Performance Test.
Articles Related
Characteristic
- Examine large volumes of data;
- Execute queries of various operational requirements and complexities (e.g., ad-hoc, reporting, iterative, OLAP, data mining);
- Are characterized by high CPU and IO load;
- Are periodically synchronized with source OLTP databases through database maintenance functions.
- multi-user
- Run on “Big Data” solutions, such as RDBMS as well as Hadoop/Spark based systems.
Prerequisites
- queries must be executed via SQL interface
Business
TPC-DS imitates the activity of a business model that:
- manage, sell and distribute products (e.g., food, electronics, furniture, music and toys etc.).
- have multiple stores located nation-wide.
- is a multi sales channel retailer selling goods through:
- store,
- web
- and catalog sales channels.
All three channel sub-systems are autonomous and retain possibly redundant information regarding customers, addresses, etc.
System:
- sales and returns,
- inventory system
- promotion system.
The following are examples of business processes of this retail company:
- Record customer purchases (and track customer returns) from any sales channel
- Modify prices according to promotions
- Maintain warehouse inventory
- Create dynamic web pages
- Maintain customer profiles (Customer Relationship Management)
Model
TPC-DS’ modeling of the business environment falls into three broad categories:
- Data Model and Data Access Assumptions
- Query and User Model Assumptions
- Data Maintenance Assumptions
Data Maintenance (Upsert)
Schema
Configuration
- Locking level: To achieve the optimal compromise between performance and operational consistency, the system administrator can set, once and for all, the locking levels and the concurrent scheduling rules for queries and data maintenance functions.
Paths and artifacts
Content File Name | Location | Usage |
---|---|---|
Data generator | dsdgen | Used to generate the data sets for the benchmark |
Query generator | dsqgen | Used to generate the query sets for the benchmark |
Query Templates | query_templates/ | Used by dsqgen to generate executable query text |
Query Template Variants | query_variants/ | Used by dsqgen to generate alternative executable query text |
Table definitions in ANSI SQL | tpcds.sql, tpcds_source.sql | Sample implementation of the logical schema for the data warehouse. |
Data Maintenance Functions in ANSI SQL | data_maintenance/ | Sample implementation of the SQL needed for the Data Maintenance phase of the benchmark |
Answer Sets | answer_sets/ | Used to verify the initial population of the data warehouse. |
Reference Data Set | run dsdgen with – validate flag | Set of files for each scale factor to compare the correct data generation of base data. |
Blog
Code
Documentation / Reference
- Beter. See the word doc with TOC in the downloaded artifact at TPC_HOME\specification\specification.docx
- To continue page 61