Skip to content

ifLabX/Awesome-LLMOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 

A Curated Guide to Awesome LLMOps Projects (2025 Edition)

Maintenance policy
This list is reviewed quarterly (January – April – July – October).
Projects must be (1) actively maintained within the last 6 months, (2) released under a permissive open-source license, and (3) either have ≥ 300 GitHub stars or demonstrable industry adoption.
Items that drop below these thresholds move to a watch-list and may be removed in the next cycle.


Introduction to LLMOps

LLMOps (Large Language Model Operations) is a specialized discipline of MLOps tailored to the unique challenges of managing the entire lifecycle of LLM-powered applications. As organizations move from experimenting with LLMs to deploying them in production, they face distinct hurdles that traditional MLOps practices do not fully address. These challenges include complex prompt engineering, continuous fine-tuning, managing Retrieval-Augmented Generation (RAG) pipelines, handling high computational costs for inference, and monitoring for specific failure modes like hallucinations, toxicity, and data-privacy leakage.

LLMOps provides the principles, practices, and tools necessary to build, deploy, and maintain these applications in a reliable, scalable, and efficient manner. This guide organizes a curated list of high-relevance, open-source tools according to the core stages of the LLMOps lifecycle, providing a top-down workflow from initial concept to production monitoring.


Table of Contents


Phase 1 – Development & Experimentation

Goal: Rapidly iterate on ideas, data, and prompts to prove technical feasibility.
Description: These tools help collect, clean, version, and explore data; craft and test prompts; prototype agents; and keep experiments reproducible.

1.1 Data Versioning & Governance

Goal: Make datasets reproducible and auditable across the project’s lifetime.
Description: Git-style version control and labeling frameworks ensure data integrity and provenance.

Project Details Repository
DVC Data Version Control – Git for Data & Models – ML Experiments Management. GitHub Badge
deeplake Data Lake for Deep Learning. Build, manage, query, version, & visualize datasets. Stream data in real-time to PyTorch/TensorFlow. GitHub Badge
LakeFS Git-like capabilities for your object storage. GitHub Badge
Cleanlab The standard data-centric AI package for data quality and machine learning with messy, real-world data and labels. GitHub Badge
Label Studio A multi-type data labeling and annotation tool with a standardized output format. Essential for creating high-quality datasets. GitHub Badge

1.2 Vector Stores & RAG Tooling

Goal: Store and retrieve embeddings efficiently for Retrieval-Augmented Generation.
Description: RAG platforms and vector databases manage unstructured knowledge and power hybrid search.

Project Details Repository
RagFlow An open-source RAG application that provides a streamlined workflow based on deep document understanding. GitHub Badge
FastGPT A platform that based on LLM, allows you to create your own knowledge-base QA model with out-of-the-box capabilities. GitHub Badge

1.3 Document Processing & Data Cleaning

Goal: Convert raw files and web sources into high-quality, LLM-ready text.
Description: ETL, parsing, and adversarial augmentation frameworks enhance data variety and robustness.

Project Details Repository
Data-Juicer A one-stop data processing system for LLMs. Used to build diverse, high-quality data recipes for pre-training and fine-tuning. GitHub Badge
Firecrawl An API service that crawls any URL and converts it into clean, LLM-ready Markdown or structured data. GitHub Badge
OneFileLLM A CLI tool to aggregate and preprocess data from multiple sources (files, GitHub, web) into a single text file for LLM use. GitHub Badge
Apache Tika A content detection and analysis framework that extracts text and metadata from a huge variety of file formats. GitHub Badge
Unstructured Open-source libraries and APIs to build custom data transformation pipelines for ETL, LLMs, and data analysis. GitHub Badge
DeepKE A deep learning based knowledge extraction toolkit, supporting named entity, relation, and attribute extraction. GitHub Badge
Lilac An open-source tool that helps you see and understand your unstructured text data. Explore, cluster, clean, and enrich datasets for LLMs. GitHub Badge
TextAttack A Python framework for adversarial attacks, data augmentation, and hard-negative generation to improve robustness. GitHub Badge

1.4 Prompt Engineering & Optimization

Goal: Design, test, and version prompts for consistent, high-quality outputs.
Description: These tools provide A/B testing, genetic search, and interactive sandboxes for rapid iteration.

Project Details Repository
promptfoo Open-source tool for testing & evaluating prompt quality. GitHub Badge
Agenta An open-source LLMOps platform with tools for prompt management, evaluation, and deployment. GitHub Badge
DSPy A framework for programming—not just prompting—language models. It allows you to optimize prompts and weights. GitHub Badge
Chainlit Build and share conversational UIs in seconds; perfect for interactive prompt sandboxing and demos. GitHub Badge

1.5 Experiment Tracking

Goal: Record, compare, and reproduce experiments across data, prompts, and models.
Description: Track metrics, parameters, and artifacts; integrate with CI to enable data-driven decisions.

Project Details Repository
MLflow An open-source framework for the end-to-end machine learning lifecycle, helping developers track experiments, evaluate models/prompts, and more. GitHub Badge
Weights & Biases A developer-first MLOps platform for experiment tracking, dataset versioning, and model management. Featuring W&B Prompts for LLM execution flow visualization. GitHub Badge
Aim An easy-to-use and performant open-source experiment tracker. GitHub Badge

1.6 LLM Evaluation

Goal: Quantify performance, robustness, and safety of prompts and models.
Description: Local and cloud frameworks automate scoring for RAG, summarization, Q&A, and more.

Project Details Repository
LangWatch Visualize LLM evaluations experiments and DSPy pipeline optimizations. GitHub Badge
Arize-Phoenix ML observability for LLMs, vision, language, and tabular models. Also offers powerful local evaluation capabilities. GitHub Badge
Evidently An open-source framework to evaluate, test and monitor ML and LLM-powered systems. GitHub Badge
Ragas RAG evaluation metrics and pipelines for faithfulness and answer relevancy. GitHub Badge
OpenAI Evals Reference harness for benchmarking GPT-style models across tasks. GitHub Badge

1.7 Agent / App Frameworks

Goal: Compose prompts, tools, and workflows into full-stack LLM applications.
Description: High-level SDKs and low-code builders accelerate agent development and experimentation.

Project Details Repository
LangChain Building applications with LLMs through composability. GitHub Badge
LlamaIndex Provides a central interface to connect your LLMs with external data. GitHub Badge
Dify An open-source LLM app development platform for building and operating generative AI-native applications. GitHub Badge
Flowise Drag & drop UI to build your customized LLM flow using LangchainJS. GitHub Badge
OpenChat Open-source ChatGPT alternative. A robust and extensible open platform for conversational AI, agentic workflows, and custom plugins. GitHub Badge
MaxKB An extensible, self-hosted, open-source knowledge base and conversational agent platform for RAG, workflow automation, and personal/private GPTs. GitHub Badge

1.8 Pipeline Orchestration

Goal: Automate batch and streaming workflows for data ingestion, fine-tuning, and evaluation.
Description: DAG-based schedulers and function-graph frameworks ensure reproducible, modular pipelines.

Project Details Repository
Apache Airflow A platform to programmatically author, schedule, and monitor workflows. Ideal for orchestrating batch jobs like fine-tuning or RAG indexing. GitHub Badge
Apache NiFi An easy-to-use, powerful, and reliable system to process and distribute data. Well-suited for real-time, streaming data pipelines for RAG. GitHub Badge
ZenML MLOps framework to create reproducible pipelines for ML and LLM workflows. GitHub Badge
Hamilton A lightweight framework to represent ML/language model pipelines as a series of Python functions. GitHub Badge

1.9 Text-to-SQL & Database Agents

Goal: Translate natural-language queries to SQL and unlock structured data for business users.
Description: These tools combine LLMs with schema discovery and query execution to generate accurate, safe SQL across diverse databases.

Project Details Repository
Chat2DB AI-augmented SQL client: natural-language to SQL, visualization, and reporting. GitHub Badge
Vanna.ai Python-based framework for schema-aware text-to-SQL and RAG-enhanced analytics. GitHub Badge
DB-GPT Private, self-hosted text-to-SQL agent framework with RAG support. GitHub Badge

1.10 LLM Web Clients & Chat UIs

Goal: Provide user-friendly, open-source frontends for ChatGPT-compatible and self-hosted LLMs, with multi-backend support, plugin systems, knowledge base, and teamwork features.
Description: These projects make it easy to interact with LLMs from web browsers and mobile devices, enabling team or personal usage, plugin integration, and knowledge management.

Project Details Repository
ChatGPT-Next-Web Open-source ChatGPT web UI, supports multiple LLM backends, fast deployment, personal/private use, and advanced features. GitHub Badge
Open WebUI Modern, extensible, and self-hosted UI for local or remote LLMs. Supports Ollama, OpenAI, and more. Teamwork and plugin support. GitHub Badge
Chatbot UI ChatGPT-style open-source web UI for connecting to OpenAI and compatible APIs, extensible and customizable for personal use. GitHub Badge
LobeChat An open-source, extensible ChatGPT web UI. Team workspace, plugin ecosystem, multi-LLM support (OpenAI, Azure, Google, Anthropic, Ollama, etc). GitHub Badge
NeatChat Minimal, clean, and privacy-friendly ChatGPT web UI, supports OpenAI, Azure, local LLMs, and markdown knowledge base. GitHub Badge

Phase 2 – Model Adaptation

Goal: Specialize general-purpose LLMs to domain-specific tasks while controlling compute and data cost.
Description: Parameter-efficient fine-tuning and editing techniques inject new knowledge and correct errors without full retraining.

2.1 PEFT & LoRA

Project Details Repository
LlamaFactory A unified, efficient fine-tuning framework for over 100 LLMs and VLMs. GitHub Badge
Swift (modelscope) A framework for fine-tuning and deploying 500+ LLMs and 200+ MLLMs, with extensive support for PEFT techniques. GitHub Badge
peft State-of-the-art Parameter-Efficient Fine-Tuning. GitHub Badge
QLoRA Finetune a 65 B parameter model on a single 48 GB GPU while preserving full 16-bit finetuning task performance. GitHub Badge
axolotl A tool designed to streamline the fine-tuning of various AI models. GitHub Badge
LoRA-Hub Community marketplace and registry for sharing and discovering LoRA weight adapters. GitHub Badge

2.2 Model Editing

Project Details Repository
FastEdit FastEdit aims to assist developers with injecting fresh and customized knowledge into large language models efficiently. GitHub Badge

Phase 3 – Deployment & Serving

Goal: Deliver low-latency, scalable inference to end users across cloud and edge environments.
Description: Engines, packaging frameworks, and local runtimes optimize throughput, cost, and portability.

3.1 High-Performance Inference & Serving

Project Details Repository
vllm A high-throughput and memory-efficient inference and serving engine for LLMs. GitHub Badge
SGLang A fast serving framework for LLMs and VLMs, designed for high throughput and controllable, structured generation. GitHub Badge
TensorRT-LLM Inference engine for TensorRT on Nvidia GPUs. GitHub Badge
Ollama Serve LLMs locally. A user-friendly application often powered by llama.cpp underneath. GitHub Badge
llama.cpp A foundational library for LLM inference in pure C/C++, enabling efficient performance on CPUs and consumer hardware. GitHub Badge

3.2 Model Deployment & Packaging

Project Details Repository
Xinference A versatile platform to serve language, speech, and multimodal models with a unified, OpenAI-compatible API. GitHub Badge
BentoML The Unified Model Serving Framework. GitHub Badge
OpenLLM An open platform for operating large language models (LLMs) in production. GitHub Badge
Kserve Standardized Serverless ML Inference Platform on Kubernetes. GitHub Badge
Triton Server The Triton Inference Server provides an optimized cloud and edge inferencing solution. GitHub Badge
Kubeflow Machine Learning Toolkit for Kubernetes, often used for orchestrating deployment pipelines. GitHub Badge

3.3 Edge / Local Runtime

Project Details Repository
llama.cpp A foundational library for LLM inference in pure C/C++, enabling efficient performance on CPUs and consumer hardware. GitHub Badge
Ollama Serve LLMs locally. A user-friendly application often powered by llama.cpp underneath. GitHub Badge

Phase 4 – Operations

Goal: Maintain reliability, cost efficiency, and user safety for live systems.
Description: Observability, guardrails, and policy frameworks provide continuous feedback and protection.

4.1 Observability & Cost Management

Project Details Repository
Helicone Open source LLM observability platform for logging, monitoring, and debugging. GitHub Badge
Portkey-SDK Control Panel with an observability suite & an AI gateway — to ship fast, reliable, and cost-efficient apps. GitHub Badge
Langfuse Open Source LLM Engineering Platform: Traces, evals, prompt management and metrics to debug and improve your LLM application. GitHub Badge

4.2 Security & Guardrails

Project Details Repository
Guardrails-AI Declarative, schema-driven validation and content moderation for LLM outputs. GitHub Badge

Phase 5 – Privacy / Governance / Compliance

Goal: Ensure AI systems meet legal, ethical, and organizational standards.
Description: Policy-as-code, bias detection, and continuous validation frameworks enable trustworthy deployment.

Project Details Repository
Giskard Testing framework dedicated to ML models, from tabular to LLMs. Detect risks of biases, performance issues and errors. GitHub Badge
Deepchecks Tests for Continuous Validation of ML Models & Data. GitHub Badge

About

The Latest Awesome series for LLMOps

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published