Paper (Win Fast or Lose Slow) | Website(Competitve Agents)
Latency Sensitive Benchmarks (LSB) are specifically designed to evaluate LLM Agents in realistic, latency-sensitive scenarios such as competitive games and high-frequency trading. In these tasks, both latency and accuracy jointly determine the final reward (e.g., game win rate or trading yield). Unlike previous benchmarks, LSB introduces two novel tasks that not only assess the intelligence of LLM agents, but also rigorously evaluate the efficiency of the underlying serving systems and algorithms. By integrating latency, accuracy, and real-world reward into a unified framework, LSB pioneers a new direction for benchmarking—encouraging the development of efficient, adaptive, and latency-aware LLM systems and algorithms. We hope our benchmarks and findings inspire the community to move beyond accuracy-centric evaluation and to build LLM solutions that truly excel in real-world, time-critical applications. We invite you to try LSB and join us in advancing this exciting frontier!
- Diverse Benchmarks: LSB offers two cutting-edge benchmarks, competitive gaming (StreetFighter) and high frequency trading backtesting system, capturing the essence of real-world, latency-sensitive tasks.
- Flexible Agent Deployment: Provides LLM agent implementations that support local, remote, and API-based serving, enabling comprehensive evaluation across different system architectures.
- System-Aware Evaluation: Highlights how agent performance varies with different serving systems and hardware configurations, offering actionable insights for both algorithm and system optimization.
Experience how LSB can help you benchmark and improve your LLM agents in truly challenging, real-time environments!
video1_8vsfpx.mp4
- Diambra
pip install diambra diambra-arena
-
Install huggingface, vllm, sglang.
-
Install other relavant envs
pip install loguru llama_index dotenv gymnasium rich openai
- Register your diambra account at here
- Install StreetFighter kernel at here. And put the zip file(do not unzip it) at $GAME_PATH(wherever you like).
Install huggingface, vllm, sglang. Install other relavant envs
pip install loguru llama_index dotenv gymnasium rich openai
change $GAME_PATH to the root path of where you put the zip file.
cd ./StreetFighter
python3 diambra -r $GAME_PATH -l python3 run_api.py --serving-choice huggingface --agent1 Qwen/Qwen3-4B --agent2 Qwen/Qwen3-8B --logdir "test.log" --device1 cuda:0 --device2 cuda:1
cd ./HFTBench
python3 Simulation.py --agent_count 1 --device_list cuda:0
See more details at HFTBench and StreetFighter.
Here we provide results on two RTX5090. More results on H100 are comming soon.
Model Parameter Size | Bitwidth Avg | Latency (ms) ↓ | Daily Yield (%) ↑ |
---|---|---|---|
14B (ours) | 7.2 | 713 | 26.52 |
14B | 8 | 801 | 23.14 |
14B | 16 | 1302 | 17.20 |
7B | 16 | 619 | -3.28 |
7B (ours) | 7.6 | 386 | -7.25 |
7B | 8 | 394 | -12.94 |
Model Parameter Size | Bitwidth Avg | Latency (ms) ↓ | Ranking Score ↑ |
---|---|---|---|
3B (ours) | 6.8 | 195 | 5.99 |
7B (ours) | 7.2 | 354 | 2.33 |
3B | 8 | 222 | 2.19 |
3B | 16 | 349 | 0.25 |
7B | 8 | 394 | -0.44 |
1.5B | 8 | 142 | -1.25 |
We welcome any design of agents tested on hardware. Please create a issue/request including your code and serving hardware.
If you find Win Fast or Lose slow useful or relevant to your research, please kindly cite our paper:
@misc{kang2025winfastloseslow,
title={Win Fast or Lose Slow: Balancing Speed and Accuracy in Latency-Sensitive Decisions of LLMs},
author={Hao Kang and Qingru Zhang and Han Cai and Weiyuan Xu and Tushar Krishna and Yilun Du and Tsachy Weissman},
year={2025},
eprint={2505.19481},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.19481},
}
- Self define agent
- Per-tick data trading with multiple agents
- FPX support with sglang and vllm engine