dsdgen generate the data sets for the benchmark (initial and refresh data)
dsdgen always needs and reads the tpcds.idx file from the current directory.
Full:
# Windows
dsdgen.exe /help
# Linux
dsdgen –h
dsdgen Population Generator (Version 2.8.0)
Copyright Transaction Processing Performance Council (TPC) 2001 - 2018
USAGE: dsdgen [options]
Note: When defined in a parameter file (using -p), parmeters should
use the form below. Each option can also be set from the command
line, using a form of '/param [optional argument]'
Unique anchored substrings of options are also recognized, and
case is ignored, so '/sc' is equivalent to '/SCALE'
General Options
===============
ABREVIATION = <s> -- build table with abreviation <s>
DIR = <s> -- generate tables in directory <s>
HELP = <n> -- display this message
PARAMS = <s> -- read parameters from file <s>
QUIET = [Y|N] -- disable all output to stdout/stderr
SCALE = <n> -- volume of data to generate in GB
TABLE = <s> -- build only table <s>
UPDATE = <n> -- generate update data set <n>
VERBOSE = [Y|N] -- enable verbose output
PARALLEL = <n> -- build data in <n> separate chunks
CHILD = <n> -- generate <n>th chunk of the parallelized data
RELEASE = [Y|N] -- display the release information
_FILTER = [Y|N] -- output data to stdout
VALIDATE = [Y|N] -- produce rows for data validation
Advanced Options
===============
DELIMITER = <s> -- use <s> as output field separator
DISTRIBUTIONS = <s> -- read distributions from file <s>
FORCE = [Y|N] -- over-write data files without prompting
SUFFIX = <s> -- use <s> as output file suffix
TERMINATE = [Y|N] -- end each record with a field delimiter
VCOUNT = <n> -- set number of validation rows to be produced
VSUFFIX = <s> -- set file suffix for data validation
RNGSEED = <n> -- set RNG seed
where:
The output of dsdgen is text.
The data generated by dsdgen includes some international characters.
See https://github.com/teradata/tpcds - dsdgen java implementation
The test database must be verified for correct data content. This must be done after the initial database load and prior to any performance tests. A validation data set is produced using dsdgen with the “-validate” and “- vcount” options. The minimum value for “-vcount” is 50, which produces 50 rows of validation data for most tables. The exceptions being the “returns” fact tables which will only have 5 rows each on average and the dimension tables with fewer than 50 total rows.
Doesn't work in 2.8.0