A comprehensive toolkit for C++ performance optimization through coverage-driven branch prediction hints and precise benchmarking.
This toolkit consists of multiple interfaces for C++ code optimization:
A smart PGO (Profile-Guided Optimization) driven tool that automatically injects [[likely]]
and [[unlikely]]
branch prediction hints into C++ code based on actual runtime coverage data.
A high-precision benchmarking utility that measures performance differences between executable versions using Python's perf_counter_ns()
for microsecond-accurate timing.
- FastAPI Backend: RESTful API with WebSocket support for real-time analysis
- React Frontend: Modern web application for file upload, task monitoring, and results visualization
- covlike-cli: User-friendly command-line wrapper with progress tracking and multiple output formats
┌───────────────────────────────────────────────────────────────────────────────┐
│ covlike optimization workflow │
├───────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌───────┐ run tests ┌─────────────┐ JSON ┌───────────┐ annotate │
│ │ clang │──────────────▶│ llvm-profraw│────────▶│ llvm-cov │───────────────┐│
│ └───────┘ └─────────────┘ └───────────┘ ││
│ ││
│ ┌───────────┐ ┌──────────────┐ ││
│ │ AST edit │───▶│source_pgo.cpp│ ││
│ └───────────┘ └──────────────┘ ││
│ ▲ ││
│ └──────────────────────────┘│
│ │
│ ┌─────────────┐ ┌─────────────┐ microbench ┌────────────────┐ │
│ │ g++ src │ │g++ src_pgo │ ◀──────────────────▶ │ performance │ │
│ │ ↓ │ │ ↓ │ high-precision │ comparison │ │
│ │ original.exe│ │ hinted.exe │ benchmarking │ & statistics │ │
│ └─────────────┘ └─────────────┘ └────────────────┘ │
└───────────────────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────┐
│ User Interfaces │
├─────────────────────────────────────┤
│ │
┌─────────────────┐ │ ┌──────────────┐ ┌─────────────┐ │ ┌──────────────┐
│ │ │ │ Web Frontend │ │ CLI │ │ │ │
│ covlike core │◄────────┤ │ (React) │ │ (Python) │ ├────────►│ Results │
│ tools │ │ └──────────────┘ └─────────────┘ │ │ │
│ │ │ │ │ │ │ │
└─────────────────┘ │ ┌──────────────┐ │ │ └──────────────┘
│ │ Web Backend │ │ │
│ │ (FastAPI) │◄───────┘ │
│ └──────────────┘ │
└─────────────────────────────────────┘
- Intelligent Branch Detection: Finds IF-guards, loop conditions, and switch-case statements
- Coverage-Driven Analysis: Uses real execution data, not static analysis
- Multiple Pattern Support:
- IF-guards (error handling, early exits)
- Loop conditions (for, while, do-while)
- Switch-case labels
- Integrated Benchmarking: Uses microbench for performance measurement
- Flexible Test Input Handling: Supports various test case formats
- High-Precision Timing: Uses
perf_counter_ns()
for microsecond accuracy - CPU Affinity Support: Pin benchmarks to specific cores via
TASKSET_CORE
- Statistical Analysis: Median-based results from multiple runs
- Cross-Platform: Works on Linux, Windows, and macOS
- Flexible Input: Supports stdin from bytes, strings, or files
- Library Interface: Use as standalone tool or Python library
- 🌐 Modern Web UI: Intuitive React-based interface with real-time updates
- 📁 File Management: Drag-and-drop upload for C++ files and test cases
- ⚡ Real-time Monitoring: WebSocket-powered progress tracking
- 📊 Interactive Results: Performance charts and side-by-side code comparison
- 🎨 Responsive Design: Works on desktop, tablet, and mobile devices
- 🔄 Task Management: Queue, monitor, and manage multiple optimization tasks
- 🚀 User-Friendly: Simple commands with intelligent defaults
- 📋 Multiple Formats: Output results in text, JSON, or YAML
- 📈 Progress Tracking: Real-time progress bars and status updates
- ⚙️ Configuration: Save and reuse analysis configurations
- 🎯 Validation: Built-in file validation and error checking
- 📝 Detailed Logging: Comprehensive logging with multiple verbosity levels
- 🔧 Flexible Options: Extensive command-line options for customization
- LLVM/Clang toolchain with coverage support
- Python 3.9+ with the following packages:
clang
(Python Clang AST module)libclang
(Python bindings for Clang)fastapi
,uvicorn
(for web backend)rich
,click
(for CLI interface)
- C++ compiler for release builds (g++ by default)
- Node.js 16+ (for React frontend development)
- Install all Python dependencies:
pip install clang libclang
pip install -r requirements.txt # Install all dependencies
- Ensure LLVM tools are in your PATH or set environment variables:
export LLVM_TOOLS_PREFIX="/usr/bin/" # if needed
export CLANG_LIBRARY_FILE="/usr/lib/libclang.so" # if needed
- Optional: Set CPU affinity for consistent benchmarking:
export TASKSET_CORE=0 # pin to CPU core 0
Start both backend and frontend with one command:
# Run both web backend and frontend
python run_web.py
This will start:
- FastAPI backend on
http://localhost:8000
- React frontend on
http://localhost:3000
Then open your browser to http://localhost:3000
to use the web interface.
Manual startup (alternative):
# Terminal 1: Start backend
cd web-backend
python run.py
# Terminal 2: Start frontend (requires Node.js)
cd web-frontend
npm install # first time only
npm start
The CLI provides a user-friendly command-line interface with progress tracking and multiple output formats.
# Analyze a C++ file with default settings
covlike-cli analyze source.cpp
# Specify test directory and output format
covlike-cli analyze source.cpp --tests ./tests --format json
# Save configuration for reuse
covlike-cli analyze source.cpp --config-save my-config
# Use saved configuration
covlike-cli analyze source.cpp --config-load my-config
# Dry run (validate without executing)
covlike-cli analyze source.cpp --dry-run
# Custom optimization parameters
covlike-cli analyze source.cpp \
--hot-threshold 0.8 \
--cold-threshold 0.2 \
--runs 10 \
--optimization-level O3
# Multiple output formats
covlike-cli analyze source.cpp --format yaml --verbose
covlike-cli analyze source.cpp --format json --output results.json
# Validation and debugging
covlike-cli validate source.cpp
covlike-cli analyze source.cpp --debug --log-level DEBUG
Command | Description |
---|---|
analyze |
Perform complete optimization analysis |
validate |
Validate C++ file without analysis |
benchmark |
Run benchmarking only |
config |
Manage saved configurations |
version |
Show version information |
help |
Show detailed help |
python covlike.py source.cpp
This will:
- Look for test cases in
source_dir/tests/
- Generate
source_pgo.cpp
with branch hints - Run benchmarks comparing original vs. hinted versions
python covlike.py source.cpp \
--tests ./my_tests \
--hot 0.8 \
--cold 0.2 \
--runs 10 \
--mode 3
Option | Default | Description |
---|---|---|
source |
- | Path to C++ source file (required) |
--tests |
source_dir/tests |
Directory containing test cases |
--hot |
0.75 |
Threshold for [[likely]] (0.0-1.0) |
--cold |
0.25 |
Threshold for [[unlikely]] (0.0-1.0) |
--runs |
7 |
Number of benchmark iterations |
--no-benchmark |
false |
Skip performance benchmarking |
--mode |
1 |
Output verbosity (0=silent, 1=basic, 2=detailed, 3=both) |
python microbench.py ./program1 ./program2
from microbench import benchmark_pair, get_summary
# Compare two executables
result = benchmark_pair(
exe1="./original",
exe2="./optimized",
runs=10,
stdin=b"test input data",
label1="Original",
label2="Optimized"
)
print(get_summary(result))
from microbench import benchmark, merge_benchmark_results
from pathlib import Path
# Benchmark single executable
times = benchmark("./myprogram", runs=5, stdin=Path("input.txt"))
print(f"Median time: {median(times[-len(times)//2:]):.3f} ms")
# Merge results from multiple test cases
results = []
for test_file in Path("tests").glob("*.in"):
result = benchmark_pair("./v1", "./v2", stdin=test_file)
results.append(result)
merged = merge_benchmark_results(results)
print(get_summary(merged))
covlike supports three test case formats:
tests/
├── test1.in # Input for test case 1
├── test1.out # Expected output (optional)
├── test2.in # Input for test case 2
└── test2.out # Expected output (optional)
tests/
└── expected.out # Single expected output, empty stdin
If tests/ is empty, the program runs once with empty stdin to collect coverage.
- Compiles your code with
-fprofile-instr-generate -fcoverage-mapping
- Runs all test cases while collecting branch execution counts
- Merges coverage data using
llvm-profdata
covlike identifies three types of branch patterns:
// Before
if (error_condition) {
return -1; // early exit
}
// After
if (error_condition) [[unlikely]] {
return -1;
}
// Before
while (condition) {
// loop body
}
// After
while (condition) [[likely]] {
// loop body
}
// Before
switch (value) {
case COMMON_CASE:
// hot path
break;
case RARE_CASE:
// cold path
break;
}
// After
switch (value) {
case COMMON_CASE: [[likely]]
// hot path
break;
case RARE_CASE: [[unlikely]]
// cold path
break;
}
- IF-guards:
[[unlikely]]
if true ratio ≤ cold threshold,[[likely]]
if ≥ hot threshold - Loops:
[[likely]]
if condition stays true ≥ hot threshold,[[unlikely]]
if ≤ cold threshold - Switch-cases: Based on relative frequency within the switch statement
- Uses
time.perf_counter_ns()
for nanosecond resolution - Converts to milliseconds with microsecond precision
- Handles subprocess execution with proper I/O redirection
- Runs multiple iterations (default: 5-10)
- Uses median of last half of runs (reduces noise from cold starts)
- Provides detailed timing statistics
export TASKSET_CORE=0 # Linux: taskset -c 0
# Windows: start /affinity 0
▶ running tests & collecting coverage …
total branches in coverage: 45
if_guard line 23: true_ratio=0.15 → [[unlikely]]
loop_cond line 67: true_ratio=0.89 → [[likely]]
switch-case line 102: freq=0.78 (156/200) → [[likely]]
✔ emitted solution_pgo.cpp with coverage-driven hints.
applied 12 attributes
▶ benchmarking on test inputs …
benchmarking with test1.in ...
benchmarking with test2.in ...
── benchmark results ──────────────────
median solution.cpp : 1.234 ms
median solution_pgo.cpp : 1.167 ms
speed-up ≈ 1.06× 🚀 (0.067 ms)
$ python microbench.py ./original ./optimized
median original : 2.456 ms
median optimized : 2.123 ms
speed-up ≈ 1.16× 🚀 (0.333 ms)
from pathlib import Path
from covlike import analyze_coverage_and_inject_hints
results = analyze_coverage_and_inject_hints(
source_path=Path("solution.cpp"),
tests_path=Path("tests"),
hot_threshold=0.8,
cold_threshold=0.2,
runs=5,
run_benchmark=True,
mode=1
)
if results["success"]:
print(f"Generated: {results['hinted_file']}")
print(f"Applied {results['coverage_analysis']['applied_hints']} hints")
# Access detailed analysis
for detail in results["coverage_analysis"]["analysis_details"]:
print(f" {detail['kind']} at line {detail['line']}: {detail.get('hint', 'no hint')}")
# Access benchmark results
if results["benchmark_results"]:
summary = results["benchmark_results"]["summary"]
print(f"Performance: {summary}")
from microbench import benchmark_pair, benchmark, merge_benchmark_results
from pathlib import Path
# Single benchmark
result = benchmark_pair(
exe1="./version_a",
exe2="./version_b",
runs=10,
stdin="test input",
label1="Version A",
label2="Version B"
)
print(f"A median: {result['medians'][result['exe1']]:.3f} ms")
print(f"B median: {result['medians'][result['exe2']]:.3f} ms")
# Multiple test cases
test_results = []
for test_input in Path("inputs").glob("*.txt"):
result = benchmark_pair("./prog1", "./prog2", stdin=test_input)
test_results.append(result)
# Merge all results
combined = merge_benchmark_results(test_results)
print(get_summary(combined))
# Individual executable benchmarking
times = benchmark("./myprogram", runs=20, stdin=b"input data")
print(f"All times: {times}")
print(f"Median: {median(times[-len(times)//2:]):.3f} ms")
# LLVM tools configuration
export LLVM_TOOLS_PREFIX="/opt/llvm/bin/"
export RELEASE_CMD="clang++"
export CLANG_LIBRARY_FILE="/usr/lib/x86_64-linux-gnu/libclang-14.so"
# Benchmarking configuration
export TASKSET_CORE=0 # Pin to specific CPU core for consistent results
- CPU Affinity: Set
TASKSET_CORE
for consistent benchmark results - Threshold Tuning: Adjust
--hot
and--cold
based on your specific workload - Benchmark Runs: Use more runs (
--runs 15
) for noisy environments - Test Coverage: Ensure test cases represent real-world usage patterns
-
"cannot find libclang"
- Install:
pip install libclang
- Set:
export CLANG_LIBRARY_FILE=/path/to/libclang.so
- Install:
-
"clang++: command not found"
- Install LLVM/Clang toolchain
- Or set:
export LLVM_TOOLS_PREFIX=/path/to/llvm/bin/
-
"No coverage data collected"
- Ensure your test cases actually execute the code
- Check that test files are in the correct format
- Verify the program compiles and runs without errors
-
"Inconsistent benchmark results"
- Set
TASKSET_CORE
to pin to a specific CPU core - Close other applications during benchmarking
- Increase the number of runs with
--runs
- Set
-
"No performance improvement"
- Try adjusting
--hot
and--cold
thresholds - Ensure your test cases represent realistic workloads
- Some codebases may not benefit significantly from branch hints
- Try adjusting
Use --mode 2
or --mode 3
for detailed analysis:
python covlike.py source.cpp --mode 3
This shows exactly which branches were analyzed and what hints were applied.
# Compare generated code
diff -u original.cpp optimized_pgo.cpp
# Manual benchmark verification
python microbench.py ./original ./optimized_pgo
# Check coverage data
llvm-cov show ./instrumented -instr-profile=merged.profdata
def analyze_coverage_and_inject_hints(
source_path: Path,
tests_path: Path = None,
hot_threshold: float = 0.75,
cold_threshold: float = 0.25,
runs: int = 7,
output_suffix: str = "_pgo",
run_benchmark: bool = True,
mode: int = 1,
) -> dict
Returns: Dictionary with keys:
success
: boolhinted_file
: Path to generated filecoverage_analysis
: Analysis resultsbenchmark_results
: Performance comparisonerrors
: List of error messages
def benchmark(exe: Union[str, Path], runs: int = 5, stdin: Union[bytes, str, Path, None] = None) -> List[float]
def benchmark_pair(exe1: Union[str, Path], exe2: Union[str, Path], runs: int = 5, stdin: Union[bytes, str, Path, None] = None, label1: Optional[str] = None, label2: Optional[str] = None) -> Dict[str, object]
def merge_benchmark_results(results_list: List[Dict]) -> Dict
def get_summary(result: Dict) -> str
Akram Rakhmetulla, Jinwoo Jeong, Ahmad Elmoursi
This work is licensed under a GNU Affero General Public License.
See LICENSE file for details.