Patent Innovation Predictor is an intelligent multi-agent system designed to automate the analysis, retrieval, and forecasting of patent innovations. Built with CrewAI, LangChain, Ollama, and OpenSearch, the system enables powerful Retrieval-Augmented Generation (RAG) workflows tailored for patent data. It dynamically analyzes any research domain (e.g., lithium batteries, hydrogen storage, quantum computing) to uncover trends and predict future innovations.
- ✨ Agentic Workflow: Uses four CrewAI agents with well-defined roles.
- 🔮 Retrieval-Augmented Generation (RAG): Real-time access to OpenSearch-powered patent database.
- 📆 Custom Research Domains: Works with any field the user inputs at runtime.
- 🧑💼 LLM Reasoning with Data: Merges patent metadata with LLM-driven insights.
- ⚡ Ollama for Local LLMs: Ensures privacy, speed, and no external dependencies.
- 🔹 Modular Design: Easily extensible with new tools, agents, or data connectors.
Component | Technology | Purpose |
---|---|---|
Language | Python 3.9+ | Core logic and orchestration |
Agents | CrewAI | Multi-agent workflow manager |
Prompt/LLM Chains | LangChain | Prompt templates, chaining, and LLM abstraction |
LLM Server | Ollama + LangChain | Local LLM execution (LLaMA3, Mistral, etc.) |
Search Engine | OpenSearch | Patent indexing, keyword & vector search |
Embeddings | nomic-embed-text | Patent vector embeddings via Ollama |
Data API | SerpAPI | Real-time patent data scraping |
Containerization | Docker | Runs Ollama & OpenSearch locally |
Storage | JSON + OpenSearch | Indexed patents and search results |
Agent | Purpose | Tools Used |
---|---|---|
💼 Research Director | Define research goals, key focus areas, time ranges | LLM only |
🧰 Patent Retriever | Query OpenSearch, retrieve & organize patents | search_patents , search_by_date_range |
📊 Data Analyst | Detect trends, tech evolution, and innovation hotspots | analyze_patent_trends |
🌌 Innovation Forecaster | Predict future breakthroughs and recommend R&D areas | LLM only |
-
Patent Collection:
- Run
information_collector.py
to fetch patent data from SerpAPI. - Save as JSON and load into OpenSearch using
opensearch_client.py
.
- Run
-
Embedding Indexing:
- Generate vector embeddings using
embedding.py
vianomic-embed-text
. - Store embeddings in OpenSearch for hybrid (keyword + vector) search.
- Generate vector embeddings using
-
Main Pipeline Execution:
- Execute
agentic_rag.py
. - Prompts user for research area (e.g., "Hydrogen Storage") and LLM model (e.g., "llama3").
- Checks if Ollama and OpenSearch are running.
- Instantiates agents and their tools.
- Agents sequentially execute tasks, passing data forward.
- Final report is saved to
/results/
.
- Execute
git clone <repo-url>
python -m venv .venv
.\.venv\Scripts\activate
pip install -r requirements.txt
Create a .env
file with your SerpAPI key:
SERPAPI_API_KEY=your_key_here
Follow instructions from OpenSearch Docs
Ensure it's running at localhost:9200
.
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama pull llama3
For embeddings:
docker exec -it ollama ollama pull nomic-embed-text
python information_collector.py # collect
python opensearch_client.py # index into OpenSearch
python embedding.py # add embeddings
python agentic_rag.py
Ai_Agent/
├── agentic_rag.py # Main CLI pipeline
├── patent_crew.py # Agents and tasks setup
├── opensearch_client.py # Index management
├── embedding.py # Embedding generator
├── information_collector.py # Patent scraper (SerpAPI)
├── helper.py # Utility functions
├── .env # Secrets (API keys)
├── requirements.txt # Python dependencies
├── files/ # Raw patent data
├── results/ # Analysis output
└── README.md # Project documentation
- Switch LLM Model: At runtime, specify any model available in Ollama (e.g., "mistral").
- Change Research Domain: Just input a new research field at runtime.
- Add New Agents: Extend
patent_crew.py
to include critics, validators, or enrichment agents. - Output Formats: Change agent outputs to JSON/CSV for dashboards or API use.
- Web Interface: Integrate with FastAPI or Streamlit for interactive UI.
Enter the research area to analyze (default: Lithium Battery): Hydrogen Storage
Enter the Ollama model to use (default: llama2): llama3
[...Agents running...]
Analysis completed and saved to: /results/patent_analysis_20250704_153000.txt
- Add Agent Memory for long-term context.
- Introduce Critic Agent for validation.
- Integrate visual analytics (charts/timelines).
- Expose as API or Web App.
-
User Input
- User starts the CLI (
python agentic_rag.py
). - User enters the research area (e.g., "Lithium Battery") and selects the Ollama LLM model (e.g., "llama3").
- User starts the CLI (
-
Environment & Service Checks
- Script checks if Ollama (LLM server) and OpenSearch (vector DB) are running and accessible.
- Ensures required models are available in Ollama.
-
Patent Data Collection & Indexing
information_collector.py
fetches patent data from SerpAPI for the specified research area.- Data is saved as JSON files in the
files/
orresults/
directory. embedding.py
generates vector embeddings for patent abstracts using Ollama’s embedding model.opensearch_client.py
creates the OpenSearch index (if not present) and ingests the patent data with embeddings.
-
Agentic RAG Workflow (CrewAI Orchestration)
- Four agents are instantiated:
- Research Director: Defines the research plan for the chosen area.
- Patent Retriever: Uses OpenSearch tools to retrieve and organize relevant patents.
- Data Analyst: Analyzes trends, patterns, and company focus from the retrieved data.
- Innovation Forecaster: Predicts future breakthroughs and R&D priorities.
- Each agent executes its task in sequence, passing outputs to the next agent.
- Four agents are instantiated:
-
Patent Search & Retrieval
- Agents use CrewAI tools to query OpenSearch for patents matching the research area and time window.
- Results are grouped, summarized, and prepared for analysis.
-
Trend Analysis & Forecasting
- Data Analyst agent identifies innovation trends, key companies, and emerging technologies.
- Innovation Forecaster agent predicts future directions, R&D priorities, and disruptive trends.
-
Reporting
- The final output (comprehensive analysis and forecast) is saved to a timestamped
.txt
file in the project directory.
- The final output (comprehensive analysis and forecast) is saved to a timestamped
-
User Review
- User reviews the saved report for insights, trends, and recommendations.
Technologies Involved at Each Step:
- Python: Orchestration, agent logic, and CLI.
- CrewAI: Multi-agent workflow and task management.
- LangChain: LLM prompt management and chaining.
- Ollama: Local LLM and embedding model serving (via Docker).
- OpenSearch: Patent data storage, keyword and vector search.
- SerpAPI: Patent data collection from Google Patents.
- Docker: Containerization for Ollama and OpenSearch services.
Summary Diagram:
User Input
↓
Service Checks (Ollama, OpenSearch)
↓
Patent Data Collection (SerpAPI → JSON)
↓
Embedding Generation (Ollama)
↓
Indexing in OpenSearch
↓
CrewAI Agentic Pipeline:
Research Plan → Patent Retrieval → Trend Analysis → Innovation Forecast
↓
Report Generation (.txt)
↓
User Review#