Skip to content

Commit b852f68

Browse files
authored
Add TPC-DS generator (#9033)
1 parent b84c274 commit b852f68

File tree

307 files changed

+128314
-9
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

307 files changed

+128314
-9
lines changed
Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,30 @@
1+
<?xml version="1.0" encoding="utf-8"?>
2+
<VisualStudioToolFile
3+
Name="Cygwin Tools"
4+
Version="8.00"
5+
>
6+
<Rules>
7+
<CustomBuildRule
8+
Name="Bison"
9+
DisplayName="Bison"
10+
CommandLine="bison -y -d [inputs] -o y.tab.c"
11+
Outputs="y.tab.c;y.tab.h"
12+
FileExtensions="*.y"
13+
ExecutionDescription="Generating parser..."
14+
>
15+
<Properties>
16+
</Properties>
17+
</CustomBuildRule>
18+
<CustomBuildRule
19+
Name="Flex"
20+
DisplayName="Flex"
21+
CommandLine="flex -otokenizer.c [inputs]"
22+
Outputs="tokenizer.c"
23+
FileExtensions="*.l"
24+
ExecutionDescription="Generating scanner..."
25+
>
26+
<Properties>
27+
</Properties>
28+
</CustomBuildRule>
29+
</Rules>
30+
</VisualStudioToolFile>

ydb/library/benchmarks/gen/tpcds-dbgen/EULA.txt

Lines changed: 79 additions & 0 deletions
Large diffs are not rendered by default.
Binary file not shown.
Binary file not shown.

ydb/library/benchmarks/gen/tpcds-dbgen/Makefile.suite

Lines changed: 685 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 201 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,201 @@
1+
/*
2+
* Legal Notice
3+
*
4+
* This document and associated source code (the "Work") is a part of a
5+
* benchmark specification maintained by the TPC.
6+
*
7+
* The TPC reserves all right, title, and interest to the Work as provided
8+
* under U.S. and international laws, including without limitation all patent
9+
* and trademark rights therein.
10+
*
11+
* No Warranty
12+
*
13+
* 1.1 TO THE MAXIMUM EXTENT PERMITTED BY APPLICABLE LAW, THE INFORMATION
14+
* CONTAINED HEREIN IS PROVIDED "AS IS" AND WITH ALL FAULTS, AND THE
15+
* AUTHORS AND DEVELOPERS OF THE WORK HEREBY DISCLAIM ALL OTHER
16+
* WARRANTIES AND CONDITIONS, EITHER EXPRESS, IMPLIED OR STATUTORY,
17+
* INCLUDING, BUT NOT LIMITED TO, ANY (IF ANY) IMPLIED WARRANTIES,
18+
* DUTIES OR CONDITIONS OF MERCHANTABILITY, OF FITNESS FOR A PARTICULAR
19+
* PURPOSE, OF ACCURACY OR COMPLETENESS OF RESPONSES, OF RESULTS, OF
20+
* WORKMANLIKE EFFORT, OF LACK OF VIRUSES, AND OF LACK OF NEGLIGENCE.
21+
* ALSO, THERE IS NO WARRANTY OR CONDITION OF TITLE, QUIET ENJOYMENT,
22+
* QUIET POSSESSION, CORRESPONDENCE TO DESCRIPTION OR NON-INFRINGEMENT
23+
* WITH REGARD TO THE WORK.
24+
* 1.2 IN NO EVENT WILL ANY AUTHOR OR DEVELOPER OF THE WORK BE LIABLE TO
25+
* ANY OTHER PARTY FOR ANY DAMAGES, INCLUDING BUT NOT LIMITED TO THE
26+
* COST OF PROCURING SUBSTITUTE GOODS OR SERVICES, LOST PROFITS, LOSS
27+
* OF USE, LOSS OF DATA, OR ANY INCIDENTAL, CONSEQUENTIAL, DIRECT,
28+
* INDIRECT, OR SPECIAL DAMAGES WHETHER UNDER CONTRACT, TORT, WARRANTY,
29+
* OR OTHERWISE, ARISING IN ANY WAY OUT OF THIS OR ANY OTHER AGREEMENT
30+
* RELATING TO THE WORK, WHETHER OR NOT SUCH AUTHOR OR DEVELOPER HAD
31+
* ADVANCE NOTICE OF THE POSSIBILITY OF SUCH DAMAGES.
32+
*
33+
* Contributors:
34+
* Gradient Systems
35+
*/
36+
37+
1. General
38+
1.1 Makefile.suite
39+
1.2 Executables and usage
40+
2. Platform-Specific Issues
41+
2.1 Linux
42+
2.2 AIX
43+
2.3 Windows
44+
3. Troubleshooting
45+
3.1 Manifest
46+
3.1 makedepend
47+
4. Extensions
48+
49+
1. General
50+
============
51+
Porting of DBGEN is intended to be very straightforward. The code is written
52+
in C. Any required changes should be limited to the files outlined below. If
53+
you encounter any problems porting the code to a new environment (i.e., one
54+
not mentioned in section 2, please contact Jack Stephens
55+
(jms@gradientsystems.com).
56+
57+
1.1 Makefile.suite
58+
Copy Makefile.suite to Makefile in the installation directory.
59+
The changes to the Makefile should be limited to the variable
60+
definitions in the first few lines of the file.
61+
CC: ANSI compiler
62+
OS: one of LINUX, WIN32, AIX, SOLARIS, HPUX.
63+
64+
OS-specific changes are detailed in section 2, below. Once any required
65+
changes have been made, it should be possible to create the required
66+
executables by executing 'make'.
67+
68+
1.2 Executables and usage
69+
The make command should result in the creation of 3 executables:
70+
-- distcomp: a distribution compiler
71+
-- dbgen2: the data generator
72+
-- qgen2: the query generator
73+
74+
dbgen2 is the data generator for tpcds. It will produce flat files to
75+
populate the data warehouse schema. See the README file for more
76+
information on its use.
77+
78+
qgen2 is the query generator for tpcds. It will translate query templates
79+
into valid SQL. See the README file for more information on its use.
80+
81+
distcomp compiles the ASCII distribution definitons found in the .dst
82+
files into a binary form, stored in tpcds.idx. Both dbgen2 and qgen2 rely
83+
on this binary file, and it must be distributed along with any
84+
executables. It is not necessary to distribute distcomp, or the dst
85+
files.
86+
87+
88+
2. Platform-Specific Issues
89+
==============================
90+
The code for these utilites has been structured to minimize the changes
91+
required to move it from one platform to the next. The following sections
92+
detail the environments under which it has been tested and the
93+
configuration changes required.
94+
95+
2.1 Linux
96+
The testing was completed under RedHat 8.0 on an Intel platform. Makefile
97+
settings/changes were:
98+
OS = LINUX
99+
100+
2.2 AIX
101+
The testing was completed under AIX 5.1 Makefile settings/changes were:
102+
OS = AIX
103+
104+
2.3 WINDOWS
105+
The testing was completed under Windows 2000, Professional, using Visual
106+
C++ 6.0. The makefile is not used in this environment, but the
107+
distribution includes workspace and project files which should allow the
108+
executables to be built without further change. The test configuration
109+
stored the source files in c:\tpc\tpcds, but the internal paths appear to
110+
be relative, and should allow rellocation.
111+
112+
Most windows installations do not include Lex or Yacc, the compiler-generation tools. The
113+
distribution includes files that they would generate (tokenizer.c, qgen.c, y.tab.h). Should
114+
it be necessary to regenerate these files, build the grammar project within the DBGEN2 workspace.
115+
116+
117+
3. Troubleshooting
118+
==================
119+
The source files are detailed below. It is likely that most issues can be
120+
resolved with minor corrections to config.h or porting.h. Please forward
121+
any problem reports, and any suggested corrections, to the subcommittee
122+
and Jack.
123+
124+
3.1 Manifest
125+
Build files
126+
----------------------
127+
Makefile.suite: make input file
128+
dbgen2.dsp: Project file (windows only)
129+
dbgen2.dsw: Workspace file (windows only)
130+
qgen2.dsp: Project file (windows only)
131+
distcomp.dsp: Project file (windows only)
132+
BUGS: Docuementation
133+
HISTORY: Docuementation
134+
PORTING.NOTES: Docuementation
135+
README: Docuementation
136+
137+
dbgen2/qgen2 files
138+
----------------------
139+
build.c: table population routines
140+
build_support.c
141+
build_support.h
142+
columns.h: schema definitions
143+
config.h: porting defines
144+
constants.h: schema definitions
145+
date.c: data type support
146+
date.h: data type support
147+
decimal.c: data type support
148+
decimal.h: data type support
149+
dist.c: distributtion support
150+
dist.h: distributtion support
151+
driver.c: dbgen2 main routines
152+
driver.h: dbgen2 main routines
153+
error_msg.h
154+
genrand.c: RNG routines
155+
genrand.h: RNG routines
156+
grammar.c: general grammar routines, used by qgen2 and distcomp
157+
grammar.h: general grammar routines, used by qgen2 and distcomp
158+
load.c: in-line load stubs
159+
load.h: in-line load stubs
160+
misc.c
161+
misc.h
162+
newqgen.c: qgen2 main routines
163+
newqgen.h: qgen2 main routines
164+
parallel.c: parallelism stubs
165+
parallel.h: parallelism stubs
166+
params.h: command line support
167+
porting.h: porting defines
168+
print.c: table print routines
169+
qgen_params.h: command line support
170+
r_params.c: command line support
171+
r_params.h: command line support
172+
tables.h: schema definitions
173+
tdefs.h: schema definitions
174+
template.c: qgen2 template parsing routines
175+
template.h: qgen2 template parsing routines
176+
text.c: data type support
177+
178+
Distribution/distcomp files
179+
----------------------
180+
dcgram.c: grammar definition
181+
dcgram.h: grammar definition
182+
dcomp.c: distcomp main routine
183+
dcomp.h: distcomp main routine
184+
dcomp_params.h: command line options
185+
calendar.dst: distribution definitions; included in tpcds.dst
186+
cities.dst: distribution definitions; included in tpcds.dst
187+
english.dst: distribution definitions; included in tpcds.dst
188+
fips.dst: distribution definitions; included in tpcds.dst
189+
names.dst: distribution definitions; included in tpcds.dst
190+
streets.dst: distribution definitions; included in tpcds.dst
191+
tpcds.dst: distribution definitions
192+
193+
3.2 Make Depend
194+
The dependecies in Makefile.suite have been hand coded, to aid in portability. If you have trouble compliling for
195+
a particular platform, and makedend is available, then 'make depend' should introduce any required,
196+
platform-specific dependencies.
197+
198+
199+
4. Extensions
200+
=============
201+
TBD
Binary file not shown.

0 commit comments

Comments
 (0)