Running local LLMs with Ollama to perform RAG for answering questions based on sample PDFs
Ollama - Please download the latest version https://ollama.com/download
As the project is running through a Dockerized environment, setting up a virtual environment is not necessary. Please ensure that ports 1024 and 8000 are open.
Note: The first time you run the project, it will download the necessary models from Ollama for the LLM and embeddings. This is a one-time setup process and may take some time depending on your internet connection.
The specificity of this project lies in its Dockerized environment. In order to make the various API calls required for this project, the use of a reverse-proxy (Nginx) was necessary.
-
Please run
docker-compose build
and then run the app dockerdocker-compose run -i app
to launch the application and chat with the model. By default, mistral llm model is used, with nomic-embed-text embedding-model. Please drop your files into Files folder -
To specify specific model / embedding-model / path for your files, please run
docker-compose run -i app python app.py -m MODEL -e EMBEDDING_MODEL -p YOUR_PATH
This will load the PDFs and Markdown files, generate embeddings, query the collection, and answer the question defined in app.py keeping up a conversation history to ensure a proper conversation.