Multimodal RAG system that ingests data containing images, text, and tables from the well known paper “Attention Is All You Need”. The system able to retrieve relevant information and reason over the multimodal data to answer user queries.
To install the necessary dependencies, you can use:
export OPENAI_API_KEY=<ADD YOUR API KEY HERE>
export LLAMA_CLOUD_API_KEY=<ADD YOUR API CLOUD API KEY HERE>
pip install -r requirements.txt
streamlit run app.py