Skip to content

Benchmark suite for analytical DBs using real-world SQL queries, Docker-based automation, and reproducible tests. Tailored for comparing systems like Apache Doris and ClickHouse with realistic datasets and actionable insights.

License

Notifications You must be signed in to change notification settings

toddstoffel/analytics_benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

28 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

OLAP Workload Testing & Benchmarking Suite

A comprehensive benchmark framework for evaluating analytical database performance using real-world datasets and standardized testing methodologies.

Docker Python License Databases Queries

Table of Contents

Overview

This benchmark suite provides standardized testing for Online Analytical Processing (OLAP) workloads across multiple database engines. The framework enables fair, reproducible performance comparisons using real-world analytical queries and datasets, helping organizations make informed decisions about analytical database technologies.

Scope and Purpose

This benchmark specifically focuses on open-source analytical databases that can be deployed on-premises and containerized with Docker. While we acknowledge the existence of excellent cloud-native solutions like Snowflake, Amazon Redshift, Google BigQuery, and Azure Synapse Analytics, these managed services are outside the scope of this evaluation.

Why Open-Source, On-Premises, and Dockerizable?

  • Data Sovereignty: Organizations requiring complete control over their data location and governance
  • Cost Predictability: Fixed infrastructure costs vs. variable cloud consumption pricing
  • Compliance Requirements: Industries with strict data residency and security regulations
  • Technology Independence: Avoiding vendor lock-in with portable, standards-based solutions
  • Customization Flexibility: Ability to modify and tune systems for specific use cases
  • Container Portability: Docker deployment enables consistent environments across development, testing, and production
  • Infrastructure Agnostic: Run on any Docker-compatible platform, from laptops to enterprise clusters

This benchmark evaluates databases that can be deployed in your own infrastructure using Docker containers, whether on bare metal, virtual machines, Kubernetes clusters, or private cloud environments, providing organizations with truly portable and self-managed analytical solutions.

Quick Results Summary

Top Performers:

  • πŸ₯‡ ClickHouse: 6.41s total, 100% success rate - Best overall performance
  • πŸ₯ˆ StarRocks: 6.76s total, 100% success rate - Excellent consistency
  • πŸ₯‰ Apache Doris: 7.66s total, 100% success rate - Balanced performance across all queries

Key Findings:

  • ClickHouse and StarRocks dominate performance benchmarks
  • TiDB/TiFlash: 12m 15s load time, 100% query success rate, but much slower data ingestion and analytical performance compared to OLAP-focused engines
  • MariaDB ColumnStore shows compatibility issues with complex queries, but has the fastest data load time
  • Real-world flight data (38M+ records) provides realistic testing scenarios

OLAP vs OLTP

OLAP systems are designed to handle complex analytical queries that typically involve:

  • Large-scale aggregations across millions of records
  • Complex multi-table joins and dimensional analysis
  • Time-series analysis and trend identification
  • Ad-hoc analytical queries with varying selectivity

Unlike OLTP (Online Transaction Processing) systems optimized for high-volume transactional workloads, OLAP databases excel at read-heavy analytical operations, making them essential for business intelligence, data analytics, and decision support systems.

Why OLAP Benchmarking Matters

Performance Evaluation

Modern organizations generate massive volumes of data requiring fast analytical processing. Benchmarking helps identify which database engines can handle your specific analytical workloads with acceptable performance characteristics.

Technology Selection

With numerous analytical database options available, from traditional columnar stores to modern cloud-native solutions, standardized benchmarking provides objective performance metrics to guide technology decisions.

Cost Optimization

Understanding performance characteristics across different systems enables better resource planning and cost optimization, especially in cloud environments where compute and storage costs directly impact operational expenses.

Dataset Description

The benchmark utilizes the Bureau of Transportation Statistics (BTS) On-Time Performance dataset, representing typical analytical scenarios found in operational reporting and business intelligence applications.

Schema Design (Star Schema)

Table Type Records Description
airlines Dimension 30 Airline carrier information
airports Dimension 400 Airport details and geographic data
flights Fact 38,083,735 Flight departure/arrival data

This star schema design reflects typical data warehouse patterns, enabling realistic testing of dimension-fact table joins, aggregations, and filtering operations.

Supported Database Engines

Tested Versions:

  • ClickHouse v23.8+
  • Apache Doris v3.0+
  • StarRocks v3.0+
  • TiDB v7.0+ (with TiFlash)
  • MariaDB ColumnStore v23.02.4+

Purpose-Built OLAP Systems

  • Apache Doris: Open-source MPP analytical database with vectorized execution engine, designed for real-time analytics and high-performance OLAP workloads
  • ClickHouse: Open-source MPP columnar analytical database with vectorized execution, optimized for real-time analytics, time-series data, and high-performance OLAP workloads
  • StarRocks: Open-source MPP analytical database with vectorized execution engine, designed for real-time analytics and high-performance OLAP workloads

Hybrid and Specialized Systems

  • MariaDB ColumnStore: Columnar storage engine for analytical workloads within the MariaDB ecosystem
  • PingCap TiDB: While primarily designed as an HTAP (Hybrid Transactional/Analytical Processing) system for OLTP workloads, TiDB includes TiFlash, a columnar storage extension that enables analytical query processing. TiFlash provides columnar replicas of TiDB data, allowing the system to handle both transactional and analytical workloads within a unified architecture.

Note: TiDB is not primarily an OLAP product but offers analytical capabilities through TiFlash.

Prerequisites & System Requirements

Software Dependencies

Docker & Docker Compose

Python 3.12+ with required packages:

TiUP (for TiDB cluster management):

curl --proto '=https' --tlsv1.2 -sSf https://tiup-mirrors.pingcap.com/install.sh | sh

ClickHouse Client:

Hardware Recommendations

Component Minimum Recommended Benchmark System
Memory 16GB RAM 32GB+ RAM 16GB (M1 Pro)
Storage 200GB free space SSD storage 926GB SSD
CPU Multi-core High-performance multi-core M1 Pro (10-core)
Network Standard High bandwidth for data loading Localhost

Note: The benchmark results shown were obtained on a MacBook Pro M1 Pro with 16GB RAM, demonstrating that excellent performance can be achieved even on laptop-class hardware.

Quick Start Guide

1. Environment Setup

git clone <repository-url>
cd analytics_benchmark

# Install Python dependencies
pip install -r requirements.txt

2. Data Preparation

python3 load/get_data.py

3. Database-Specific Execution

Apache Doris

# Start Apache Doris
docker compose -f docker/doris.yml up -d

# Load data (note: Doris requires additional time for backend nodes to initialize)
python3 load/load_data.py --database doris

# Run benchmark
python3 run_benchmarks.py --database doris

ClickHouse

# Start ClickHouse
docker compose -f docker/clickhouse.yml up -d

# Load data
python3 load/load_data.py --database clickhouse

# Run benchmark
python3 run_benchmarks.py --database clickhouse

MariaDB ColumnStore

# Start MariaDB ColumnStore
docker compose -f docker/columnstore.yml up -d

# Load data
python3 load/load_data.py --database columnstore

# Run benchmark
python3 run_benchmarks.py --database columnstore

TiDB with TiFlash

# Start TiDB using TiUP (note: TiDB uses TiUP instead of Docker for optimal performance)
tiup playground

# Load data
python3 load/load_data.py --database tidb

# Run benchmark
python3 run_benchmarks.py --database tidb

Note: TiDB uses TiUP for cluster management rather than Docker Compose. TiUP provides better resource management and TiFlash integration for analytical workloads.

StarRocks

# Start StarRocks
docker compose -f docker/starrocks.yml up -d

# Load data
python3 load/load_data.py --database starrocks

# Run benchmark
python3 run_benchmarks.py --database starrocks

Important Notes on Database Startup

Apache Doris & StarRocks: These systems require additional time for backend nodes to initialize. The data loading script will automatically wait for all components to be ready (up to 2 minutes for each system). If you encounter "no available capacity" errors, the backend nodes may still be provisioning - wait longer or restart the containers.

TiDB: TiFlash columnar replicas need time to synchronize after initial data loading for optimal analytical performance.

ClickHouse: Generally starts quickly but may need a few seconds for metadata initialization.

MariaDB ColumnStore: Requires proper columnstore engine initialization which can take 30-60 seconds.

Reproducibility

To reproduce the exact results shown in this benchmark:

  1. Environment: Use the same hardware specifications listed in Testing Environment
  2. Software Versions: Ensure you're using the database versions specified in Supported Database Engines
  3. Data Consistency: The benchmark uses the same BTS dataset with 38,083,735 flight records
  4. Configuration: Use the provided Docker configurations without modification
  5. Methodology: Follow the cold cache approach - restart containers between test runs

Note: Results may vary slightly due to system load and hardware differences, but relative performance rankings should remain consistent.

Benchmark Methodology

Query Suite Design

The benchmark includes a comprehensive set of 20 analytical queries representing common OLAP patterns:

  • Simple aggregations and filtering operations
  • Multi-table joins with dimensional analysis
  • Time-based analysis and seasonal trending
  • Complex analytical functions and window operations
  • Advanced star schema pattern demonstrations
  • Competitive analysis and market intelligence queries
  • Aircraft utilization and operational efficiency analysis
  • Weather impact and disruption pattern analysis
  • Cross-airline performance benchmarking
  • Varying data selectivity and aggregation complexity levels

Example Query Patterns:

The actual benchmark queries are significantly more complex than basic examples. Here are simplified representations of the analytical patterns used:

-- Pattern from Query 1: Multi-CTE airline market analysis
-- (Actual query calculates market share percentages with cross joins)
WITH flight_stats AS (
    SELECT a.airline, COUNT(*) AS volume, SUM(f.cancelled) AS cancelled
    FROM flights f JOIN airlines a ON f.carrier = a.iata_code
    WHERE f.year = 2020 GROUP BY a.airline
)
SELECT fs.airline, fs.volume,
       ROUND(100 * fs.volume / total.volume, 2) AS market_share_pct
FROM flight_stats fs CROSS JOIN (SELECT SUM(volume) AS volume FROM flight_stats) total;

-- Pattern from Query 12: Advanced window functions with performance categorization  
-- (Actual query includes month name conversion and performance categories)
WITH monthly_perf AS (
    SELECT a.airline, f.month, AVG(f.arr_delay) AS avg_delay
    FROM flights f JOIN airlines a ON f.carrier = a.iata_code
    GROUP BY a.airline, f.month HAVING COUNT(*) >= 100
)
SELECT airline, month, avg_delay,
       RANK() OVER (PARTITION BY month ORDER BY avg_delay ASC) AS rank,
       CASE WHEN RANK() OVER (...) <= 3 THEN 'Top Performer' ELSE 'Average' END AS category
FROM monthly_perf;

Note: These are simplified representations. The actual queries in queries/sql/ are more sophisticated, featuring complex CTEs, advanced aggregations, comprehensive business logic, and performance optimizations. View the individual .sql files for complete query implementations.

Performance Metrics

  • Query Execution Time: Individual query performance measurement
  • Data Loading Performance: Bulk data ingestion capabilities
  • Error Handling: System stability and query completion rates
  • Resource Utilization: Memory and CPU consumption patterns

Testing Environment

  • Hardware: MacBook Pro (M1 Pro, 2021)
    • CPU: Apple M1 Pro (10-core: 8 performance + 2 efficiency cores)
    • Memory: 16GB unified memory
    • Storage: 926GB SSD
  • OS: macOS 15.5 (24F74)
  • Containerization: Docker Desktop for Mac
  • Container Resources:
    • Memory limit: 8GB per database container
    • CPU limit: 6 cores per database container
    • Shared volumes for data persistence
  • Methodology: Cold cache scenarios for realistic performance
  • Workload: Single-user workload simulation
  • Resource Allocation: Standardized hardware allocation across all databases
  • Network: Localhost connections to eliminate network latency

Performance Results

Last Updated: July 2025 | Dataset: 38,083,735 flight records | Test Environment: MacBook Pro M1 Pro

Data Load Performance

The following table shows data ingestion performance for the complete dataset (38,083,735 flight records):

Load Speed Legend:

  • 🟒 Fast (< 1.5 minutes)
  • 🟑 Moderate (1.5–5 minutes)
  • 🟠 Slow (5–10 minutes)
  • πŸ”΄ Very Slow (> 10 minutes)
Database Import Time Rows/s Status Notes
ClickHouse 🟒 57.00 sec 694,723 βœ… Success Native CSV, columnar
Doris 🟒 85.00 sec 447,940 βœ… Success Stream Load, BE warmup
StarRocks 🟒 100.00 sec 379,777 βœ… Success Vectorized ingest
TiDB/TiFlash πŸ”΄ 735.00 sec 51,563 βœ… Success Lightning, replica
ColumnStore 🟒 39.00 sec 1,109,264 βœ… Success cpimport, conversion

Query Execution Performance

The following table shows execution times for each query in the benchmark suite:

Performance Legend:

  • 🟒 Excellent (< 0.5 seconds)
  • 🟑 Good (0.5 - 2.0 seconds)
  • 🟠 Moderate (2.0 - 10.0 seconds)
  • πŸ”΄ Slow (> 10.0 seconds)
  • ❌ Error
Query ClickHouse ColumnStore Doris StarRocks TiDB
1 🟒 0.13 sec ❌ Error 🟒 0.19 sec 🟒 0.13 sec 🟑 0.53 sec
2 🟑 0.54 sec 🟠 6.35 sec 🟑 0.53 sec 🟑 0.99 sec 🟑 0.94 sec
3 🟒 0.05 sec ❌ Error 🟒 0.05 sec 🟒 0.08 sec 🟒 0.12 sec
4 🟒 0.01 sec ❌ Error 🟒 0.02 sec 🟒 0.05 sec 🟒 0.06 sec
5 🟒 0.10 sec 🟑 0.99 sec 🟒 0.06 sec 🟒 0.08 sec 🟒 0.37 sec
6 🟑 0.66 sec πŸ”΄ 15.03 sec 🟑 0.88 sec 🟑 0.57 sec 🟠 3.72 sec
7 🟑 0.63 sec πŸ”΄ 14.40 sec 🟑 0.76 sec 🟒 0.46 sec 🟠 3.42 sec
8 🟒 0.06 sec 🟑 1.96 sec 🟒 0.11 sec 🟒 0.08 sec 🟒 0.40 sec
9 🟑 0.68 sec πŸ”΄ 14.27 sec 🟑 0.77 sec 🟒 0.42 sec 🟠 3.42 sec
10 🟒 0.09 sec 🟠 2.57 sec 🟒 0.14 sec 🟒 0.11 sec 🟒 0.46 sec
11 🟒 0.22 sec 🟠 4.04 sec 🟒 0.28 sec 🟒 0.19 sec 🟑 0.99 sec
12 🟑 0.81 sec πŸ”΄ 20.81 sec 🟑 1.07 sec 🟑 0.62 sec 🟠 5.31 sec
13 🟒 0.25 sec 🟠 4.74 sec 🟒 0.31 sec 🟒 0.19 sec 🟠 2.46 sec
14 🟒 0.49 sec 🟠 9.58 sec 🟒 0.44 sec 🟒 0.36 sec πŸ”΄ 74.33 sec
15 🟒 0.40 sec 🟠 9.05 sec 🟒 0.48 sec 🟒 0.27 sec πŸ”΄ 147.88 sec
16 🟒 0.24 sec 🟠 6.32 sec 🟒 0.33 sec 🟑 0.90 sec 🟠 3.44 sec
17 🟒 0.16 sec ❌ Error 🟒 0.16 sec 🟒 0.14 sec 🟒 0.22 sec
18 🟒 0.33 sec 🟠 5.74 sec 🟒 0.40 sec 🟑 0.66 sec 🟠 2.89 sec
19 🟒 0.31 sec ❌ Error 🟒 0.38 sec 🟒 0.27 sec πŸ”΄ 59.05 sec
20 🟒 0.25 sec 🟠 5.36 sec 🟒 0.30 sec 🟒 0.19 sec 🟠 2.77 sec

Key Observations

Performance Leaders

  • ClickHouse: Exceptional performance across all query types with 100% success rate and consistently fast execution times. Particularly strong for simple aggregations and filtering operations, with excellent scalability to complex analytical queries.
  • StarRocks: Outstanding performance with 100% query success rate and highly competitive execution times. Demonstrates excellent consistency across different query patterns with total execution time of 6.75 seconds.
  • Apache Doris: Balanced performance characteristics suitable for general analytical workloads with full query compatibility
  • Data load performance varies widely: MariaDB ColumnStore and ClickHouse are fastest for ingestion, while TiDB/TiFlash is the slowest due to its architecture.

System-Specific Insights

ClickHouse Analysis: Demonstrates outstanding performance with perfect 100% query success rate (20/20 queries) and fastest overall execution time (6.38 sec). Shows excellent optimization for both simple and complex analytical patterns, making it ideal for high-performance analytical workloads. Also among the fastest for bulk data ingestion (57.00 sec).

StarRocks Analysis: Delivers exceptional performance with perfect 100% query success rate (20/20 queries) and highly competitive total execution time (6.75 sec). Shows excellent vectorized execution capabilities and strong optimization for both simple and complex analytical patterns, making it a strong contender for high-performance analytical workloads. Data load speed is also competitive (100.00 sec).

TiDB/TiFlash Analysis: Achieves 100% query success rate (20/20 queries) with total execution time of 312.78 sec. While TiDB shows slower analytical performance compared to purpose-built OLAP systems, particularly on complex queries (queries 14, 15, and 19), it provides the unique advantage of unified HTAP capabilitiesβ€”enabling transactional and analytical workloads on the same dataset without ETL processes. Performance varies significantly by query complexity, with simple queries executing competitively but complex analytical patterns taking substantially longer. Data load speed for TiDB/TiFlash is significantly slower (735.00 sec) than other engines, reflecting the overhead of TiDB Lightning bulk import and TiFlash replica creation. This is a tradeoff for unified HTAP capabilities and strong transactional consistency. TiDB/TiFlash remains a viable choice for organizations prioritizing hybrid transactional/analytical workloads and seamless data integration, but is not recommended for time-sensitive bulk analytical ingestion.

MariaDB ColumnStore: Shows moderate compatibility challenges with complex analytical queries, achieving 75% success rate (15/20 queries). Failed queries primarily involve advanced CTEs and complex window functions, indicating some limitations with modern SQL analytical patterns. When successful, performance is generally slower than purpose-built OLAP systems, with total execution time of 121.21 sec for successful queries. Despite the fastest data load time (39.00 sec), query compatibility and execution speed lag behind OLAP-focused systems.

Apache Doris: Demonstrates excellent compatibility with 100% query success rate and balanced performance characteristics across all query complexity levels, making it suitable for comprehensive analytical workloads. Data load speed is competitive (85.00 sec).

Performance Summary

Database Query Time Success Failed Load Time Best For
ClickHouse 6.41 sec 100% None 57.00 sec High-perf analytics
StarRocks 6.76 sec 100% None 100.00 sec Consistent performance
Doris 7.66 sec 100% None 85.00 sec Balanced workloads
TiDB/TiFlash 312.78 sec 100% None 735.00 sec HTAP scenarios
ColumnStore 121.21 sec 75% 1,3,4,17,19 39.00 sec Legacy integration

*Query Time: total for successful queries only

Limitations & Considerations

  • Results reflect single-user scenarios and may not represent concurrent workload performance
  • Container-based deployment may not reflect optimized production configurations
  • Dataset size and query patterns may not represent all analytical workload characteristics
  • Performance results are specific to the tested hardware and software configurations

Contributing

We welcome contributions to improve this benchmark suite! Here's how you can help:

Ways to Contribute

  • Add New Databases: Implement connectors for additional OLAP systems
  • Expand Query Suite: Add more analytical query patterns
  • Improve Documentation: Enhance setup guides and analysis
  • Report Issues: Submit bug reports or performance inconsistencies
  • Add Test Cases: Contribute additional datasets or scenarios

Getting Started

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/new-database)
  3. Make your changes and test thoroughly
  4. Submit a pull request with detailed description

Development Guidelines

  • Follow existing code patterns and naming conventions
  • Add comprehensive documentation for new features
  • Include performance benchmarks for new database integrations
  • Ensure Docker configurations are properly tested

Contact: For questions or collaboration opportunities, please open an issue or reach out via GitHub discussions.

FAQ

Common Questions

Q: Why do some queries fail on certain databases? A: Different databases have varying SQL compatibility. ColumnStore, for example, has limitations with advanced CTEs and window functions. All failed queries are clearly marked with ❌ in the results.

Q: Can I run this benchmark on my own hardware? A: Yes! The benchmark is designed to be portable. Results will vary based on your hardware configuration, but relative performance comparisons should remain consistent.

Q: How often are these benchmarks updated? A: We aim to update benchmarks quarterly or when major database versions are released. Check the commit history for the latest updates.

Q: Why is TiDB slower than other databases? A: TiDB is primarily an HTAP system designed for transactional workloads. Its analytical performance via TiFlash is slower but offers the unique advantage of unified transactional and analytical processing.

Q: Can I add my own queries to the benchmark? A: Absolutely! Add your queries to the queries/sql/ directory and they'll be automatically included in the benchmark run.

Troubleshooting

Issue: "No available capacity" errors with Doris/StarRocks
Solution: Backend nodes need time to initialize. Wait 2-3 minutes or restart containers.

Issue: ClickHouse connection failures
Solution: Ensure the client is installed and the container is fully started before running benchmarks.

Issue: TiDB data loading is slow
Solution: TiFlash needs time to create columnar replicas. This is normal and improves query performance.

Issue: "Couldn't find a tree builder with the features you requested: xml" error during data download
Solution: This indicates lxml is missing. Run pip install -r requirements.txt again, which now includes lxml for XML parsing support.

Related Projects

For additional context and comparison, consider these established analytical benchmarks:

This benchmark complements existing efforts by focusing specifically on Docker-deployable, on-premises solutions with real-world transportation data, providing a practical evaluation framework for organizations considering self-hosted analytical databases.

License

This project is licensed under the MIT License - see the LICENSE file for details.


This benchmark provides a foundation for analytical database evaluation. Production deployment decisions should consider additional factors including operational requirements, integration capabilities, support ecosystems, and total cost of ownership beyond raw performance metrics.

About

Benchmark suite for analytical DBs using real-world SQL queries, Docker-based automation, and reproducible tests. Tailored for comparing systems like Apache Doris and ClickHouse with realistic datasets and actionable insights.

Topics

Resources

License

Stars

Watchers

Forks

Languages