Welcome to the Makers AI Assistant! This project is an intelligent, conversational AI designed to help clients find the perfect freelancer for their needs. It combines a powerful question-answering system with a personalized recommendation engine to create a seamless and helpful user experience.
- ๐ค Conversational AI Chat: A friendly and interactive chat interface built with Streamlit.
- โก Dynamic Model Selection: Choose between Google's powerful
Gemini 1.5 Flash
for deep reasoning or Groq's blazing-fastLlama3 8B
for near-instant responses. - ๐ง Intent Detection: The assistant intelligently understands whether you're asking a question or looking for a freelancer.
- ๐ Knowledge Base Q&A: Utilizes a Retrieval-Augmented Generation (RAG) service to answer questions about the platform using a library of documents.
- โจ Personalized Freelancer Recommendations: Recommends the best freelancers for the job based on the conversation history, scoring them on skills, specialties, and more.
- ๐ Dynamic UI: The recommendation panel updates in real-time as the conversation evolves.
- ๐ Feedback Mechanism: Users can provide direct feedback (๐/๐) on the assistant's answers, which is logged for future analysis.
- โ Comprehensive Testing: Includes a full suite of unit and integration tests to ensure reliability.
- โก Optimized Performance: Implements smart caching for the
/chat
endpoint, significantly speeding up responses to repeated queries. - ๐ก๏ธ Enhanced Security:
- Prompt Injection Defense: Incoming queries are scanned for common prompt injection patterns and rejected if detected.
- Inappropriate Content Filtering: Utilizes Gemini's built-in safety settings and Groq's
Llama-Guard-4
model to filter harmful or inappropriate content.
The assistant's backend has been significantly upgraded to a sophisticated multi-agent architecture, enhancing its reasoning capabilities and specialization:
- ๐ค Dedicated Agent Roles:
- Research Agent: Responsible for gathering information from the knowledge base and freelancer database.
- Customer-Facing Agent: Interacts directly with the user, parses queries (including budget constraints), and formulates initial responses.
- Manager Agent: Oversees the process, approves final responses, and handles escalations for sensitive topics.
- ๐งฑ Structured Data Flow: Implemented
AgentState
to manage the flow of information between agents. This includes structuredresearch_findings
(containing freelancers, articles, and knowledge base chunks) and anescalation_topic
field. - ๐ฐ Enhanced Budget Handling: The Customer-Facing Agent now intelligently parses budget constraints from user queries (e.g., "find a developer under $50/hour"). It filters freelancer recommendations accordingly and provides specific lists of matching freelancers or clearly states if none are found.
- ๐ก๏ธ Sensitive Query Escalation: Financial queries related to platform margins, fees, or commissions are now automatically detected by the Customer-Facing Agent and escalated to the Manager Agent. The Manager Agent provides a standardized, authoritative response, ensuring consistent and approved messaging for sensitive topics.
- ๐ฏ Targeted Recommendations: The system now provides more relevant recommendations by distinguishing between requests for freelancers and general questions, and by incorporating article recommendations directly into chat responses when appropriate.
- ๐ฌ Embedded Recommendations: Freelancer and article cards are now embedded directly within the same chat bubble as the AI's response, providing a more integrated and seamless experience.
- ๐งน Cleaner Agent Responses: Meta-commentary and extraneous prefaces from the Manager Agent (e.g., "Here's a refined response...") are now stripped, presenting only the core message to the user.
- ๐ Optimized Sidebar Layout:
- The sidebar section has been renamed to "Cost, Agentic workflow, and Recommendations Monitoring."
- Duplicate "Cost Evolution" graphs have been removed.
- The API Cost Evolution chart is now correctly positioned under the new header and its height has been adjusted for a more compact and user-friendly sidebar.
- ๐ Anchored Chat Input: The chat input field now remains fixed at the bottom of the conversation thread, ensuring it's always accessible.
- Frontend: Streamlit - For the interactive web application.
- Backend: Flask - For the robust API server.
- AI & Machine Learning:
- LLMs: Google Gemini, Groq Llama3
- Embeddings: Google
text-embedding-004
- Vector Search: FAISS - For efficient similarity search.
- Testing: Pytest, Pytest-Mock
- Programming Language: Python 3.10+
Follow these steps to get the project running on your local machine.
- Python 3.10 or higher
pip
andvenv
for package management
-
Clone the repository:
git clone https://github.com/your-username/personalized-recommendation-capabilities.git cd personalized-recommendation-capabilities
-
Set up the virtual environment:
python3 -m venv .venv source .venv/bin/activate
-
Install the required packages:
pip install -r requirements.txt
-
Set up your environment variables:
- Create a file named
.env
by copying the example:cp .env.example .env
- Add your API keys to the
.env
file:GEMINI_API_KEY="YOUR_GEMINI_API_KEY_HERE" GROQ_API_KEY="YOUR_GROQ_API_KEY_HERE"
- Create a file named
-
Start the backend server:
python -m backend
-
Start the frontend application (in a new terminal):
streamlit run frontend/app.py
To ensure everything is working correctly, run the test suite:
source .venv/bin/activate
pytest
- Frontend (
frontend/app.py
): The Streamlit app captures user input and the selected AI model. - API Call: It sends the query, chat history, and chosen model to the backend
/chat
endpoint. - Backend (
backend/app.py
):- The Flask server receives the request.
- Intent Detection: It analyzes the query to determine if it's a
question_answering
orrecommendation
intent. - Routing:
- For questions, it calls the
RAGService
to find relevant documents and generate an answer using the selected model (Gemini or Groq). - For freelancer requests, it returns a confirmation message, triggering the frontend to fetch recommendations.
- For questions, it calls the
- Recommendation Update: The frontend makes a separate call to the
/recommendations
endpoint. This uses theRecommendationService
to score and rank freelancers based on the full chat history. - Security & Optimization Pre-checks (for
/chat
):- Caching: The system first checks an in-memory cache. If the same query (and model) was processed recently, the cached response is returned instantly.
- Prompt Injection Scan: If not cached, the query is scanned for malicious patterns. If flagged, it's rejected.
- Content Moderation (Groq): If using a Groq model, the query is then sent to
Llama-Guard-4
for a safety check. If deemed unsafe, it's rejected.
- RAG Service & LLM Interaction:
- For questions, the
RAGService
retrieves relevant documents. - It then calls the selected LLM (Gemini or Groq) to generate an answer, incorporating context and chat history.
- Content Moderation (Gemini): Gemini's API has safety settings configured to block harmful responses. If a response is blocked, a generic safety message is returned.
- For questions, the
- Recommendation Update: The frontend makes a separate call to the
/recommendations
endpoint. This uses theRecommendationService
to score and rank freelancers based on the full chat history. - Feedback Loop: User feedback is sent to the
/feedback
endpoint and logged as structured JSON for future analysis.
POST /chat
: Main endpoint for conversational interactions. Handles intent detection and model selection.POST /recommendations
: Fetches updated freelancer and article recommendations.POST /feedback
: Logs user feedback on assistant responses.
- Advanced Interaction: Implement sentiment detection for adaptive responses.
- Robustness & Security: Implement rate limiting and explore more advanced security measures.
- Monitoring & Analytics: Integrate comprehensive tracking for latency, cost, API call metrics, and user engagement.
- Database Integration: Migrate from JSON files to a robust database (e.g., SQLite, PostgreSQL) for persistent storage of freelancer profiles, chat history, and feedback.
- Automated Evaluation: Develop scripts for automated evaluation against ground truth datasets to measure accuracy and relevance.
- Advanced Prompting: Experiment with techniques like few-shot prompting and chain-of-thought for more complex reasoning.
- Context Compression: Implement strategies to compress context for LLM API calls, optimizing for cost and token limits.
- Scalability: Deploy the application using a production-ready WSGI server like Gunicorn and explore containerization (e.g., Docker).