Skip to content

A Streamlit-based web interface to run and compare multiple LLMs in parallel using Ollama, with support for dynamic model selection, prompt input, and side-by-side response comparison.

License

Notifications You must be signed in to change notification settings

ChayScripts/Parallel-LLM-Runner

Repository files navigation

Parallel LLM Runner

This project demonstrates how to load and run multiple LLMs using Ollama and Python (Streamlit & Requests).

Why This Project?

I've been using Ollama and wanted a simple graphical user interface (GUI) for it. I tried OpenWebUI (very good product), but it felt too complex for my basic needs. Plus, it requires Docker, which adds extra steps to set up (consumes memory, disk and cpu). So, I decided to create this project to have an easier option that allows me to select multiple models and run them simultaneously, add or remove more models as needed.

Prerequisites

  • Ollama (Installed locally from ollama.com website)
  • Ollama Models (Any model as you like)
  • Python 3.13.5 or higher
  • pip (Python package installer)

Tested On

  • Python 3.13.5
  • Windows Server 2022 OS
  • Ollama version 0.9.0

Setting Up the Environment

Windows

  1. Open Command Prompt.

  2. Create a virtual environment:

    python -m venv LLM_Parallel_Run
  3. Activate the virtual environment:

    .\LLM_Parallel_Run\Scripts\activate
  4. Install the required packages:

    pip install streamlit requests

Linux

  1. Open a terminal.

  2. Create a virtual environment:

    python3 -m venv LLM_Parallel_Run
  3. Activate the virtual environment:

    source LLM_Parallel_Run/bin/activate
  4. Install the required packages:

    pip install streamlit requests

macOS

  1. Open a terminal.

  2. Create a virtual environment:

    python3 -m venv LLM_Parallel_Run
  3. Activate the virtual environment:

    source LLM_Parallel_Run/bin/activate
  4. Install the required packages:

    pip install streamlit requests

Running the Application

Once the environment is set up and the packages are installed, based on the view you like (horizontal or vertical) copy the code from this repository Horizontal/Vertical View - app.py file (Ex: Horizontal View - app.py), rename it as app.py and run your Streamlit application using the following command and access the application from browser.

#windows
.\LLM_Parallel_Run\Scripts\activate
streamlit run app.py

#Mac and Linux
source ./LLM_Parallel_Run/scripts/activate
streamlit run app.py

Run without terminal

To run a Streamlit app in a Python virtual environment without opening a terminal, create and run a shortcut or script that activates the virtual environment and starts the app. Here's how to do it on different platforms:


Windows (.bat file):

  1. Create a run_streamlit.bat file with the following content:
@echo off
call C:\path\to\venv\Scripts\activate.bat
streamlit run C:\path\to\your_app.py

Next create a .vbs file (e.g., launch_app.vbs) in the same folder and double click it. It will open browser directly without opening terminal.

Set WshShell = CreateObject("WScript.Shell")
WshShell.Run chr(34) & "C:\path\to\run_streamlit.bat" & chr(34), 0
Set WshShell = Nothing
  1. Double-click the .vbs file to launch the app. After you close the browser, if it does not close python and streamlit.exe processes, you have to manually kill those processes or they will pile up for every time you launch the app.

macOS/Linux (.sh file):

  1. Create a run_streamlit.sh script:
#!/bin/bash
source /path/to/venv/bin/activate
streamlit run /path/to/your_app.py
  1. Make it executable:
chmod +x run_streamlit.sh
  1. Run it via double-click or from a launcher depending on your desktop environment.

Note

  • Latest scripts in this repo will query all your models at same time. If you want quick output, load your ollama models in your machine before starting any streamlit commands. By default ollama loads model for 5 mins and unloads them automatically after 5 mins. To keep them for longer time, use OLLAMA_KEEP_ALIVE ollama parameter and set time in mins (Ex: 30m) or hours (Ex: 4h). So, this script will query all models at same time and as the models are loaded already, output will be faster.
  • Vertical view denotes prompt and models are in vertical layout to the left. Horiztontal view denotes prompt and models are in horizontal layout.
  • In vertical view, you can move prompt and models window to the right as needed and move them to the left, to give more space to your output window.
  • Using this streamlit site you can run multiple LLMs at same time. But if your results shows one after the other, you should set OLLAMA_MAX_LOADED_MODELS = 2 (or any number as your hardware supports). Refer to Ollama documentation on how to use it in your OS version.
  • If you have downloaded a new model while streamlit app is running, stop the streamlit app and rerun it. If not, new model will not be detected by streamlit and you cant see it in the dropdown while selecting the model.
  • Source files are provided for horizontal and vertical view for Prompt and Model selection. Use anything you'd like and rename it to app.py.

Prompt & Model Selection Horizontal View - Quick Look

Alt Text

Prompt & Model Selection Vertical View - Quick Look

Alt Text

Authors

Contributing

Please follow github flow for contributing.

License

This project is licensed under the MIT License - see the LICENSE file for details

About

A Streamlit-based web interface to run and compare multiple LLMs in parallel using Ollama, with support for dynamic model selection, prompt input, and side-by-side response comparison.

Resources

License

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Languages