A complete demonstration of building an AI agent that can answer questions about enterprise documents stored in Box using MongoDB Atlas vector search and LangGraph workflows.
This project shows you how to:
- Connect to Box and upload documents
- Process documents with LangChain and create text chunks
- Store embeddings in MongoDB Atlas with vector search capabilities
- Build AI tools for vector and full-text search
- Create a LangGraph agent that can intelligently use these tools
- Enable conversational memory with MongoDB checkpointing
Box Documents โ LangChain Processing โ MongoDB Atlas Vector Store โ LangGraph Agent โ AI Responses
โ โ
Sample Data (5 Tech Earnings Reports) Vector + Full-Text Search
You'll need accounts and API keys for:
- Box Developer Account (for document storage)
- MongoDB Atlas (for vector search)
- OpenAI (for embeddings and language models)
git clone https://github.com/box-community/ai-agent-langgraph-atlas-vector-search.git
cd ai-agent-langgraph-atlas-vector-search
uv sync # This installs all dependencies
Create a .env
file in the project root:
# Box Configuration
BOX_CLIENT_ID=your_box_client_id
BOX_CLIENT_SECRET=your_box_client_secret
BOX_SUBJECT_ID=your_box_user_id
# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
# MongoDB Configuration
MONGODB_URI=your_mongodb_connection_string
Execute the scripts in order:
# Step 1: Initialize data and create indexes
python src/a_init_data.py
# Step 2: Test the search tools
python src/b_agent_tools.py
# Step 3: Test basic agent functionality
python src/c_agent_demo.py
# Step 4: Run the full LangGraph workflow
python src/d_langgraph.py
โโโ sample_data/
โ โโโ Q4 Tech earnings-Demo/ # Sample financial documents
โโโ rag_demo/
โ โโโ README.md # Jupyter notebook example
โ โโโ rag_demo.ipynb # Interactive demo notebook
โโโ src/
โ โโโ a_init_data.py # Data initialization and indexing
โ โโโ b_agent_tools.py # Search tools definition
โ โโโ c_agent_demo.py # Basic agent setup
โ โโโ d_langgraph.py # Complete LangGraph workflow
โโโ README.md # This file
What it does:
- Connects to Box using Client Credentials Grant (CCG) authentication
- Uploads sample earnings reports to Box
- Downloads and processes documents with LangChain
- Creates vector embeddings using OpenAI
- Sets up MongoDB Atlas vector and full-text search indexes
Key functions:
get_box_client()
- Initializes Box connectionupload_sample_data()
- Uploads documents to Boxload_documents_from_box()
- Processes documents with LangChaincreate_vector_index()
- Creates MongoDB vector search index
What it does:
- Defines LangChain tools for vector and full-text search
- Implements semantic search using MongoDB Atlas Vector Search
- Implements keyword search using MongoDB Atlas Full-Text Search
Tools created:
vector_search()
- Finds semantically similar contentfull_text_search()
- Finds exact text matches
What it does:
- Creates a ChatOpenAI model with tool binding
- Defines system prompts for the agent
- Tests tool selection and invocation
Key components:
get_llm_with_tools()
- Configures LLM with search tools- System prompt that guides tool usage
What it does:
- Builds a complete conversational agent using LangGraph
- Implements state management and tool routing
- Adds conversation memory with MongoDB checkpointing
- Enables multi-turn conversations
Graph nodes:
agent
- Makes decisions and calls toolstools_node
- Executes search tools- Conditional routing between nodes
For a hands-on experience, use the Jupyter notebook RAG demo:
jupyter notebook rag_demo/rag_demo.ipynb
The notebook walks through each step interactively and shows the results at each stage.
Question: "What are the biggest challenges facing tech companies?"
Agent Process:
- Receives user question
- Decides to use
vector_search
tool - Retrieves relevant document chunks
- Synthesizes answer from search results
- Provides response with context
Sample Response:
Based on the earnings reports, major tech companies face several challenges:
1. **China Macro Environment**: Challenging macroeconomic conditions affecting operations
2. **Infrastructure Costs**: Significant growth in infrastructure expenses
3. **AI Investment Requirements**: Need for continued investment in data centers and AI
4. **Shifting Technology Preferences**: Changes in consumer preferences for Western technology
Sources: Apple_analysis.docx, Meta_analysis.docx, NVIDIA_analysis.docx
- Automatic chunking with configurable size and overlap
- Metadata preservation including source URLs and titles
- Error handling for duplicate files and API issues
- Vector search for semantic similarity
- Full-text search for exact matches
- Hybrid approach combining both search types
- Thread-based conversations using MongoDB checkpointing
- State persistence across multiple interactions
- Context retention for follow-up questions
- Box CCG authentication for secure enterprise access
- MongoDB Atlas for production-scale vector search
- Configurable chunking and retrieval parameters
# In a_init_data.py, adjust chunking parameters
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=500, # Increase for larger chunks
chunk_overlap=50 # Adjust overlap as needed
)
# In b_agent_tools.py, modify retrieval parameters
retriever = vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k": 10}, # Return more results
)
# In c_agent_demo.py, switch models
llm = ChatOpenAI(model="gpt-3.5-turbo") # Use different model
- API Keys: Store all credentials in environment variables
- Box Permissions: Documents extracted lose Box security controls
- MongoDB Access: Use MongoDB Atlas built-in security features
- Rate Limits: Monitor OpenAI API usage and costs
Box Authentication Failed
# Check your credentials in .env file
# Ensure Box app has correct permissions
# Verify user ID is correct
MongoDB Connection Error
# Verify connection string format
# Check network access to Atlas cluster
# Ensure cluster is running
OpenAI API Errors
# Check API key validity
# Monitor rate limits and quotas
# Verify model availability
- LangChain Documentation: https://python.langchain.com/
- LangGraph Guide: https://langchain-ai.github.io/langgraph/
- MongoDB Atlas Vector Search: https://www.mongodb.com/docs/atlas/atlas-vector-search/
- Box Developer Documentation: https://developer.box.com/
- OpenAI API Reference: https://platform.openai.com/docs/
This is a demonstration project designed for learning. Feel free to:
- Experiment with different models and parameters
- Add new document types and processing methods
- Implement additional search strategies
- Extend the agent capabilities
This project is provided as-is for educational and development purposes.