A Curated Guide to Awesome LLMOps Projects (2025 Edition)

Maintenance policy
This list is reviewed quarterly (January – April – July – October).
Projects must be (1) actively maintained within the last 6 months, (2) released under a permissive open-source license, and (3) either have ≥ 300 GitHub stars or demonstrable industry adoption.
Items that drop below these thresholds move to a watch-list and may be removed in the next cycle.

Introduction to LLMOps

LLMOps (Large Language Model Operations) is a specialized discipline of MLOps tailored to the unique challenges of managing the entire lifecycle of LLM-powered applications. As organizations move from experimenting with LLMs to deploying them in production, they face distinct hurdles that traditional MLOps practices do not fully address. These challenges include complex prompt engineering, continuous fine-tuning, managing Retrieval-Augmented Generation (RAG) pipelines, handling high computational costs for inference, and monitoring for specific failure modes like hallucinations, toxicity, and data-privacy leakage.

LLMOps provides the principles, practices, and tools necessary to build, deploy, and maintain these applications in a reliable, scalable, and efficient manner. This guide organizes a curated list of high-relevance, open-source tools according to the core stages of the LLMOps lifecycle, providing a top-down workflow from initial concept to production monitoring.

Phase 1 – Development & Experimentation

Goal: Rapidly iterate on ideas, data, and prompts to prove technical feasibility.
Description: These tools help collect, clean, version, and explore data; craft and test prompts; prototype agents; and keep experiments reproducible.

1.1 Data Versioning & Governance

Goal: Make datasets reproducible and auditable across the project’s lifetime.
Description: Git-style version control and labeling frameworks ensure data integrity and provenance.

Project	Details	Repository
DVC	Data Version Control – Git for Data & Models – ML Experiments Management.
deeplake	Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data in real-time to PyTorch/TensorFlow.
LakeFS	Git-like capabilities for your object storage.
Cleanlab	The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels.
Label Studio	A multi-type data labeling and annotation tool with a standardized output format. Essential for creating high-quality datasets.

1.2 Vector Stores & RAG Tooling

Goal: Store and retrieve embeddings efficiently for Retrieval-Augmented Generation.
Description: RAG platforms and vector databases manage unstructured knowledge and power hybrid search.

Project	Details	Repository
RagFlow	An open-source RAG application that provides a streamlined workflow based on deep document understanding.
FastGPT	A platform that based on LLM, allows you to create your own knowledge-base QA model with out-of-the-box capabilities.

1.3 Document Processing & Data Cleaning

Goal: Convert raw files and web sources into high-quality, LLM-ready text.
Description: ETL, parsing, and adversarial augmentation frameworks enhance data variety and robustness.

Project	Details	Repository
Data-Juicer	A one-stop data processing system for LLMs. Used to build diverse, high-quality data recipes for pre-training and fine-tuning.
Firecrawl	An API service that crawls any URL and converts it into clean, LLM-ready Markdown or structured data.
OneFileLLM	A CLI tool to aggregate and preprocess data from multiple sources (files, GitHub, web) into a single text file for LLM use.
Apache Tika	A content detection and analysis framework that extracts text and metadata from a huge variety of file formats.
Unstructured	Open-source libraries and APIs to build custom data transformation pipelines for ETL, LLMs, and data analysis.
DeepKE	A deep learning based knowledge extraction toolkit, supporting named entity, relation, and attribute extraction.
Lilac	An open-source tool that helps you see and understand your unstructured text data. Explore, cluster, clean, and enrich datasets for LLMs.
TextAttack	A Python framework for adversarial attacks, data augmentation, and hard-negative generation to improve robustness.

1.4 Prompt Engineering & Optimization

Goal: Design, test, and version prompts for consistent, high-quality outputs.
Description: These tools provide A/B testing, genetic search, and interactive sandboxes for rapid iteration.

Project	Details	Repository
promptfoo	Open-source tool for testing & evaluating prompt quality.
Agenta	An open-source LLMOps platform with tools for prompt management, evaluation, and deployment.
DSPy	A framework for programming—not just prompting—language models. It allows you to optimize prompts and weights.
Chainlit	Build and share conversational UIs in seconds; perfect for interactive prompt sandboxing and demos.

1.5 Experiment Tracking

Goal: Record, compare, and reproduce experiments across data, prompts, and models.
Description: Track metrics, parameters, and artifacts; integrate with CI to enable data-driven decisions.

Project	Details	Repository
MLflow	An open-source framework for the end-to-end machine learning lifecycle, helping developers track experiments, evaluate models/prompts, and more.
Weights & Biases	A developer-first MLOps platform for experiment tracking, dataset versioning, and model management. Featuring W&B Prompts for LLM execution flow visualization.
Aim	An easy-to-use and performant open-source experiment tracker.

1.6 LLM Evaluation

Goal: Quantify performance, robustness, and safety of prompts and models.
Description: Local and cloud frameworks automate scoring for RAG, summarization, Q&A, and more.

Project	Details	Repository
LangWatch	Visualize LLM evaluations experiments and DSPy pipeline optimizations.
Arize-Phoenix	ML observability for LLMs, vision, language, and tabular models. Also offers powerful local evaluation capabilities.
Evidently	An open-source framework to evaluate, test and monitor ML and LLM-powered systems.
Ragas	RAG evaluation metrics and pipelines for faithfulness and answer relevancy.
OpenAI Evals	Reference harness for benchmarking GPT-style models across tasks.

1.7 Agent / App Frameworks

Goal: Compose prompts, tools, and workflows into full-stack LLM applications.
Description: High-level SDKs and low-code builders accelerate agent development and experimentation.

Project	Details	Repository
LangChain	Building applications with LLMs through composability.
LlamaIndex	Provides a central interface to connect your LLMs with external data.
Dify	An open-source LLM app development platform for building and operating generative AI-native applications.
Flowise	Drag & drop UI to build your customized LLM flow using LangchainJS.
OpenChat	Open-source ChatGPT alternative. A robust and extensible open platform for conversational AI, agentic workflows, and custom plugins.
MaxKB	An extensible, self-hosted, open-source knowledge base and conversational agent platform for RAG, workflow automation, and personal/private GPTs.

1.8 Pipeline Orchestration

Goal: Automate batch and streaming workflows for data ingestion, fine-tuning, and evaluation.
Description: DAG-based schedulers and function-graph frameworks ensure reproducible, modular pipelines.

Project	Details	Repository
Apache Airflow	A platform to programmatically author, schedule, and monitor workflows. Ideal for orchestrating batch jobs like fine-tuning or RAG indexing.
Apache NiFi	An easy-to-use, powerful, and reliable system to process and distribute data. Well-suited for real-time, streaming data pipelines for RAG.
ZenML	MLOps framework to create reproducible pipelines for ML and LLM workflows.
Hamilton	A lightweight framework to represent ML/language model pipelines as a series of Python functions.

1.9 Text-to-SQL & Database Agents

Goal: Translate natural-language queries to SQL and unlock structured data for business users.
Description: These tools combine LLMs with schema discovery and query execution to generate accurate, safe SQL across diverse databases.

Project	Details	Repository
Chat2DB	AI-augmented SQL client: natural-language to SQL, visualization, and reporting.
Vanna.ai	Python-based framework for schema-aware text-to-SQL and RAG-enhanced analytics.
DB-GPT	Private, self-hosted text-to-SQL agent framework with RAG support.

1.10 LLM Web Clients & Chat UIs

Goal: Provide user-friendly, open-source frontends for ChatGPT-compatible and self-hosted LLMs, with multi-backend support, plugin systems, knowledge base, and teamwork features.
Description: These projects make it easy to interact with LLMs from web browsers and mobile devices, enabling team or personal usage, plugin integration, and knowledge management.

Project	Details	Repository
ChatGPT-Next-Web	Open-source ChatGPT web UI, supports multiple LLM backends, fast deployment, personal/private use, and advanced features.
Open WebUI	Modern, extensible, and self-hosted UI for local or remote LLMs. Supports Ollama, OpenAI, and more. Teamwork and plugin support.
Chatbot UI	ChatGPT-style open-source web UI for connecting to OpenAI and compatible APIs, extensible and customizable for personal use.
LobeChat	An open-source, extensible ChatGPT web UI. Team workspace, plugin ecosystem, multi-LLM support (OpenAI, Azure, Google, Anthropic, Ollama, etc).
NeatChat	Minimal, clean, and privacy-friendly ChatGPT web UI, supports OpenAI, Azure, local LLMs, and markdown knowledge base.

Phase 2 – Model Adaptation

Goal: Specialize general-purpose LLMs to domain-specific tasks while controlling compute and data cost.
Description: Parameter-efficient fine-tuning and editing techniques inject new knowledge and correct errors without full retraining.

2.1 PEFT & LoRA

Project	Details	Repository
LlamaFactory	A unified, efficient fine-tuning framework for over 100 LLMs and VLMs.
Swift (modelscope)	A framework for fine-tuning and deploying 500+ LLMs and 200+ MLLMs, with extensive support for PEFT techniques.
peft	State-of-the-art Parameter-Efficient Fine-Tuning.
QLoRA	Finetune a 65 B parameter model on a single 48 GB GPU while preserving full 16-bit finetuning task performance.
axolotl	A tool designed to streamline the fine-tuning of various AI models.
LoRA-Hub	Community marketplace and registry for sharing and discovering LoRA weight adapters.

2.2 Model Editing

Project	Details	Repository
FastEdit	FastEdit aims to assist developers with injecting fresh and customized knowledge into large language models efficiently.

Phase 3 – Deployment & Serving

Goal: Deliver low-latency, scalable inference to end users across cloud and edge environments.
Description: Engines, packaging frameworks, and local runtimes optimize throughput, cost, and portability.

3.1 High-Performance Inference & Serving

Project	Details	Repository
vllm	A high-throughput and memory-efficient inference and serving engine for LLMs.
SGLang	A fast serving framework for LLMs and VLMs, designed for high throughput and controllable, structured generation.
TensorRT-LLM	Inference engine for TensorRT on Nvidia GPUs.
Ollama	Serve LLMs locally. A user-friendly application often powered by llama.cpp underneath.
llama.cpp	A foundational library for LLM inference in pure C/C++, enabling efficient performance on CPUs and consumer hardware.

3.2 Model Deployment & Packaging

Project	Details	Repository
Xinference	A versatile platform to serve language, speech, and multimodal models with a unified, OpenAI-compatible API.
BentoML	The Unified Model Serving Framework.
OpenLLM	An open platform for operating large language models (LLMs) in production.
Kserve	Standardized Serverless ML Inference Platform on Kubernetes.
Triton Server	The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Kubeflow	Machine Learning Toolkit for Kubernetes, often used for orchestrating deployment pipelines.

3.3 Edge / Local Runtime

Project	Details	Repository
llama.cpp	A foundational library for LLM inference in pure C/C++, enabling efficient performance on CPUs and consumer hardware.
Ollama	Serve LLMs locally. A user-friendly application often powered by llama.cpp underneath.

Phase 4 – Operations

Goal: Maintain reliability, cost efficiency, and user safety for live systems.
Description: Observability, guardrails, and policy frameworks provide continuous feedback and protection.

4.1 Observability & Cost Management

Project	Details	Repository
Helicone	Open source LLM observability platform for logging, monitoring, and debugging.
Portkey-SDK	Control Panel with an observability suite & an AI gateway — to ship fast, reliable, and cost-efficient apps.
Langfuse	Open Source LLM Engineering Platform: Traces, evals, prompt management and metrics to debug and improve your LLM application.

4.2 Security & Guardrails

Project	Details	Repository
Guardrails-AI	Declarative, schema-driven validation and content moderation for LLM outputs.

Phase 5 – Privacy / Governance / Compliance

Goal: Ensure AI systems meet legal, ethical, and organizational standards.
Description: Policy-as-code, bias detection, and continuous validation frameworks enable trustworthy deployment.

Project	Details	Repository
Giskard	Testing framework dedicated to ML models, from tabular to LLMs. Detect risks of biases, performance issues and errors.
Deepchecks	Tests for Continuous Validation of ML Models & Data.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

A Curated Guide to Awesome LLMOps Projects (2025 Edition)

Introduction to LLMOps

Table of Contents

Phase 1 – Development & Experimentation

1.1 Data Versioning & Governance

1.2 Vector Stores & RAG Tooling

1.3 Document Processing & Data Cleaning

1.4 Prompt Engineering & Optimization

1.5 Experiment Tracking

1.6 LLM Evaluation

1.7 Agent / App Frameworks

1.8 Pipeline Orchestration

1.9 Text-to-SQL & Database Agents

1.10 LLM Web Clients & Chat UIs

Phase 2 – Model Adaptation

2.1 PEFT & LoRA

2.2 Model Editing

Phase 3 – Deployment & Serving

3.1 High-Performance Inference & Serving

3.2 Model Deployment & Packaging

3.3 Edge / Local Runtime

Phase 4 – Operations

4.1 Observability & Cost Management

4.2 Security & Guardrails

Phase 5 – Privacy / Governance / Compliance

About

Uh oh!

Releases

Packages

License

ifLabX/Awesome-LLMOps

Folders and files

Latest commit

History

Repository files navigation

A Curated Guide to Awesome LLMOps Projects (2025 Edition)

Introduction to LLMOps

Table of Contents

Phase 1 – Development & Experimentation

1.1 Data Versioning & Governance

1.2 Vector Stores & RAG Tooling

1.3 Document Processing & Data Cleaning

1.4 Prompt Engineering & Optimization

1.5 Experiment Tracking

1.6 LLM Evaluation

1.7 Agent / App Frameworks

1.8 Pipeline Orchestration

1.9 Text-to-SQL & Database Agents

1.10 LLM Web Clients & Chat UIs

Phase 2 – Model Adaptation

2.1 PEFT & LoRA

2.2 Model Editing

Phase 3 – Deployment & Serving

3.1 High-Performance Inference & Serving

3.2 Model Deployment & Packaging

3.3 Edge / Local Runtime

Phase 4 – Operations

4.1 Observability & Cost Management

4.2 Security & Guardrails

Phase 5 – Privacy / Governance / Compliance

About

Resources

License

Code of conduct

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Packages