This repository provides an easy-to-use framework for indexing and querying documents with the LlamaIndex library. By following the structure of this code, users can create their own Q&A system that indexes documents, persists the index for later use, and performs efficient querying.
Create a folder named 'data' and place your PDF files inside it. The 'storage' folder will store the indexed data for later use.
This Python code is designed to create a Question & Answer system using LlamaIndex and Streamlit. The system works by indexing a set of documents and then allowing users to query the index for relevant information.
Here’s a breakdown of the key components:
-
Loading Environment Variables:
The code begins by importing necessary modules and loading the environment variables from a.envfile using thepython-dotenvlibrary. Theload_dotenv()function ensures that environment variables (like API keys or file paths) are available to the app.from dotenv import load_dotenv load_dotenv('/Users/rahulsharma/Desktop/rag llm/.env')
-
Setting Up the Index:
The LlamaIndex library is used to process documents. TheVectorStoreIndexstores the document data in a vector format, which allows for efficient querying. The SimpleDirectoryReader is used to read the documents from the "data" directory.- If the index doesn't already exist, the documents are read, and an index is created.
- If the index already exists (i.e., stored in the
./storagedirectory), it loads the existing index for querying.
if not os.path.exists(PERSIST_DIR): documents = SimpleDirectoryReader("data").load_data() index = VectorStoreIndex.from_documents(documents) index.storage_context.persist(persist_dir=PERSIST_DIR) else: storage_context = StorageContext.from_defaults(persist_dir=PERSIST_DIR) index = load_index_from_storage(storage_context)
-
Querying the Index:
Once the index is loaded (either created or retrieved), it is ready to be queried. Thequery_engineobject is created using the index, and a sample query ("What are transformers?") is processed. The response from the query is printed.query_engine = index.as_query_engine() response = query_engine.query("What are transformers?") print(response)
-
Flexibility and Persistence:
The system is designed to store the index in a persistent directory (./storage) so that it doesn’t need to be recreated every time the application runs. This allows users to add more documents or make changes to the index without starting from scratch.
