Welcome to LlamaIndex! We’re excited that you want to contribute and become part of our growing community. Whether you're interested in building integrations, fixing bugs, or adding exciting new features, we've made it easy for you to get started.
We use uv
as the package and project manager for all the Python packages in this repository. Before contributing, make
sure you have uv
installed. On macOS and Linux:
curl -LsSf https://astral.sh/uv/install.sh | sh
On Windows:
powershell -ExecutionPolicy ByPass -c "irm https://astral.sh/uv/install.ps1 | iex"
For more install options, see uv
's official documentation.
If you're ready to dive in, here’s a quick setup guide to get you going:
- Fork the GitHub repo, clone your fork and open a terminal at the root of the git repository
llama_index
. - At the root of the repo, run the following command to setup the global virtual environment we use for the pre-commit hooks and the linters:
uv sync
- Navigate to the project folder you want to work on. For example, if you want to work on the OpenAI llm integration:
cd llama-index-integrations/llms/llama-index-llms-openai
uv
will take care of creating and setting up the virtual environment for the specific project you're working on. For example, to run the tests you can do:
uv run -- pytest
- If you want to create the virtual environment explicitly, without
uv run
ning a command:
uv venv
- To activate the virtual environment:
source .venv/bin/activate
That’s it! The package you're working on is already installed in editable mode, so you can go on, change the code and run the tests!
Once you get familiar with the project, scroll down to the Development Guidelines for more details.
There’s plenty of ways to contribute—whether you’re a seasoned Python developer or just starting out, your contributions are welcome! Here are some ideas:
Help us extend LlamaIndex's functionality by contributing to any of our core modules. Think of this as unlocking new superpowers for LlamaIndex!
- New Integrations (e.g., connecting new LLMs, storage systems, or data sources)
- Data Loaders, Vector Stores, and more!
Explore the different modules below to get inspired!
New integrations should meaningfully integrate with existing LlamaIndex framework components. At the discretion of LlamaIndex maintainers, some integrations may be declined.
Create new Packs, Readers, or Tools that simplify how others use LlamaIndex with various platforms.
Have an idea for a feature that could make LlamaIndex even better? Go for it! We love innovative contributions.
Fixing bugs is a great way to start contributing. Head over to our Github Issues page and find bugs tagged as good first issue
.
If you’ve used LlamaIndex in a unique or creative way, consider sharing guides or notebooks. This helps other developers learn from your experience.
Got an out-there idea? We’re open to experimental features—test it out and make a PR!
Help make the project easier to navigate by refining the docs or cleaning up the codebase. Every improvement counts!
A data loader ingests data from any source and converts it into Document
objects that LlamaIndex can parse and index.
- Interface:
load_data
: Returns a list ofDocument
objects.lazy_load_data
: Returns an iterable ofDocument
objects (useful for large datasets).
Example: MongoDB Reader
💡 Ideas: Want to load data from a source not yet supported? Build a new data loader and submit a PR!
A node parser converts Document
objects into Node
objects—atomic chunks of data that LlamaIndex works with.
- Interface:
get_nodes_from_documents
: Returns a list ofNode
objects.
Example: Hierarchical Node Parser
💡 Ideas: Add new ways to structure hierarchical relationships in documents, like play-act-scene or chapter-section formats.
A text splitter breaks down large text blocks into smaller chunks—this is key for working with LLMs that have limited context windows.
- Interface:
split_text
: Takes a string and returns smaller strings (chunks).
Example: Token Text Splitter
💡 Ideas: Build specialized text splitters for different content types, like code, dialogues, or dense data!
Store embeddings and retrieve them via similarity search with vector stores.
- Interface:
add
,delete
,query
,get_nodes
,delete_nodes
,clear
Example: Pinecone Vector Store
💡 Ideas: Create support for vector databases that aren't yet integrated!
- Query Engines implement
query
to return structured responses. - Retrievers retrieve relevant nodes based on queries.
💡 Ideas: Design fancy query engines that combine retrievers or add intelligent processing layers!
- Fork the repository on GitHub.
- Clone your fork to your local machine.
git clone https://github.com/your-username/llama_index.git
- Create a branch for your work.
git checkout -b your-feature-branch
- Set up your environment (follow the Quick Start Guide).
- Work on your feature or bugfix, ensuring you have unit tests covering your code.
- Commit your changes, then push them to your fork.
git push origin your-feature-branch
- Open a pull request on GitHub.
And voilà—your contribution is ready for review!
LlamaIndex is organized as a monorepo, meaning different packages live within this single repository. You can focus on a specific package depending on your contribution:
- Core package:
llama-index-core/
- Integrations: e.g.,
llama-index-integrations/
We use pytest
for testing. Make sure you run tests in each package you modify:
uv run -- pytest
If you’re integrating with a remote system, mock it to prevent test failures from external changes.
By default, CICD will fail if test coverage is less than 50% -- so please do add tests for your code!
We’d love to hear from you and collaborate! Join our Discord community to ask questions, share ideas, or just chat with fellow developers.
Join us on Discord https://discord.gg/dGcwcsnxhU
Thank you for considering contributing to LlamaIndex! Every contribution—whether it’s code, documentation, or ideas—helps make this project better for everyone.
Happy coding! 😊