Splitwiser | 2024

Official implementation for Splitwiser: Efficient LM inference with constrained resources.

Authored by: Asad Aali, Adney Cardoza, Melissa Capo

Abstract

Efficient inference of LLMs remains a crucial challenge, with two main phases: a compute-intensive prompt computation and a memory-intensive token generation. Despite existing batching and scheduling techniques, token generation phases fail to fully utilize compute resources, especially when compared to prompt computation phases. To address these challenges, we propose Splitwiser, a methodology that splits the two phases of an LLM inference request onto the same GPU, thereby reducing overhead and improving memory access and cache utilization. By eliminating the need to transfer data across devices, Splitwiser aims to minimize network-related overheads. In this paper, we describe the basic structure of our proposed pipeline while sharing preliminary results and analysis. We implement our proposed multiprocessing design on two widely-used and independent LLM architectures: Huggingface and vLLM.

Code written on top of the official RadAdapt implementation:
RadAdapt: Radiology Report Summarization via Lightweight Domain Adaptation of Large Language Models

Environment

Use these commands to set up a conda environment:

conda env create -f env/environment.yml -n splitwiser
conda activate splitwiser

Usage

In src/constants.py, set your own project directory DIR_PROJECT.
Run a script, setting model and case_id as desired:
- run_discrete.sh: discrete prompting output.
- run_discrete_mp.sh: discrete prompting using PyTorch Multiprocessing.
- run_discrete_mps.sh: discrete prompting using NVIDIA MPS.
To add your own dataset, follow the format in the folder models/data/.

Citation

@article{aali2025splitwiser,
  title={Splitwiser: Efficient LM inference with constrained resources},
  author={Aali, Asad and Cardoza, Adney and Capo, Melissa},
  journal={arXiv preprint arXiv:2505.03763},
  year={2025}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Splitwiser | 2024

Abstract

Environment

Usage

Citation

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
env		env
figs		figs
models/data		models/data
src		src
LICENSE		LICENSE
README.md		README.md
run_discrete.sh		run_discrete.sh
run_discrete_mp.sh		run_discrete_mp.sh
run_discrete_mps.sh		run_discrete_mps.sh

License

asad-aali/splitwiser

Folders and files

Latest commit

History

Repository files navigation

Splitwiser | 2024

Abstract

Environment

Usage

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages