🚧 Work In Progress
This project is still under active development. The following documentation is AI-generated and requires future cleanup and validation.
This is a Rust rewrite of datafusion-sqlancer, originally implemented in Java. The rewrite aims to simplify implementation, enable better integration with existing DataFusion tooling, and make test oracles applicable to
sqllogictests
. See this issue for more details on the motivation behind the Rust rewrite.
A comprehensive fuzzing tool for Apache DataFusion, designed to test SQL query execution and find potential bugs, crashes, or inconsistencies in the query engine.
To run the fuzzer with default settings:
cargo run --release
To run with a custom configuration:
cargo run --release -- --config datafusion-fuzzer.toml
To run with command-line options:
cargo run --release -- --config datafusion-fuzzer.toml --rounds 5 --queries-per-round 20
The fuzzer supports extensive configuration options to customize the fuzzing process.
You can configure DataFusion Fuzzer in two ways:
- Configuration file: Use a TOML file to specify detailed settings
- Command-line arguments: Override configuration file settings or use standalone
See datafusion-fuzzer.toml
for an example configuration file:
# Fuzzing execution settings
seed = 42
rounds = 3
queries_per_round = 10
timeout_seconds = 30
# Logging settings
display_logs = true
enable_tui = false
# log_path = "logs/datafusion-fuzzer.log"
# Table generation parameters
max_column_count = 5
max_row_count = 100
max_expr_level = 3
max_table_count = 3
Options:
-c, --config <FILE> Path to config file
-s, --seed <SEED> Random seed [default: 42]
-r, --rounds <ROUNDS> Number of rounds to run
-q, --queries-per-round <QUERIES> Number of queries per round
-t, --timeout <TIMEOUT> Query timeout in seconds
-l, --log-path <LOG_PATH> Path to log file
-h, --help Print help
-V, --version Print version
max_table_count
: Maximum number of tables that can be selected in a single query (default: 3)max_column_count
: Maximum number of columns per generated table (default: 5)max_row_count
: Maximum number of rows per generated table (default: 100)max_expr_level
: Maximum expression nesting level (default: 3)
- where
- sort + limit, offset
- aggregate
- having
- join
- union/union all/intersect/except
- views
- scalar subquery
- 'relation-like' subquery
- Operators
- Scalar functions
- Aggregate Functions
- Window Functions
- Complete Primitive types
- Time-related types
- Array types
- Struct/Json
- CLI
- Oracle interface