Hexgen-Flow

The alpha-tuning simulator is implemented to evaluate scheduling algorithms proposed by Hexgen-Flow when serving Text-to-SQL queries. These queries typically involve multiple LLM (Large Language Model) inference calls. The simulator allows users to experiment with different configurations, including input traces, arrival rates, and GPU deployment settings, to analyze the performance of the scheduling algorithm.

Features

Simulate Text-to-SQL Queries: The simulator processes input trace files containing LLM inference requests and simulates their execution.
Customizable Configurations: Users can modify input traces, arrival rates, and GPU settings to explore various scenarios.
Heterogeneous GPU Deployment: Supports different GPU types and configurations.
Performance Metrics: Reports latency for each end-to-end Text-to-SQL query and provides detailed statistics.

Getting Started

Prerequisites

Python 3.8 or higher
Required Python packages (listed in requirements.txt)

Install the dependencies using:

pip install -r simulator/requirements.txt

Running the Simulator

The main script to run the simulator is start_simulator.py. Below is an example command:

python simulator/cli/start_simulator.py \
    --input ./simulator/cli/input_file_trace3.json \
    --n-engines 4 \
    --arrival-rate 1.0 \
    --trace-output ./result/trace_output.json \
    --stats-output ./result/stats_output.json

Command-Line Arguments

--input: Path to the input trace file (JSON format).
--n-engines: Number of model instances.
--arrival-rate: Arrival rate of Text-to-SQL queries.
--trace-output: Path to save the trace output file.
--stats-output: Path to save the statistics output file.
--SLO: (Optional) Service Level Objective for Text-to-SQL requests.
--alpha: (Optional) Alpha value for the optimized engine.

Input Trace File

The input trace file should be a JSON file containing the input and output lengths of each LLM inference request. Example input files are provided in the simulator/cli/ directory.

Output

Trace Output: Contains detailed trace events of the simulation.
Latency Output: Contains the latency results for the simulation, saved as latency{index}.json. This file provides latency metrics for each end-to-end Text-to-SQL query processed under the scheduling algorithm.

Customization

Users can modify the start_simulator.py script to:

Experiment with different GPU configurations.
Implement new scheduling algorithms.
Adjust the number of engines or other parameters.

Example Use Cases

Analyze Different Arrival Rates: Change the --arrival-rate parameter to observe how the system performs under varying query loads.
Test Different Input Traces: Use different input trace files with the --input parameter to simulate various Text-to-SQL query patterns.
Experiment with GPU Settings: Modify the --n-engines parameter and the GPU configurations in start_simulator.py to test heterogeneous GPU deployments.

Repository Structure

simulator/: Contains the core simulation logic and utilities.
- cli/: Command-line interface scripts.
- core/: Core simulation components, including engines and policies.
- internal/: Internal utilities and configurations.
- profiler/: Profiling tools.
- ui/: User interface utilities for displaying results.

Acknowledgments

This simulator builds upon the codebase from the ETH EASL Scratchpad repository, which provided the foundation for its initial implementation.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
simulator		simulator
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Hexgen-Flow

Features

Getting Started

Prerequisites

Running the Simulator

Command-Line Arguments

Input Trace File

Output

Customization

Example Use Cases

Repository Structure

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Languages

Relaxed-System-Lab/Hexgen-Flow

Folders and files

Latest commit

History

Repository files navigation

Hexgen-Flow

Features

Getting Started

Prerequisites

Running the Simulator

Command-Line Arguments

Input Trace File

Output

Customization

Example Use Cases

Repository Structure

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Languages

Packages