🔍 Automate company research, email discovery, LinkedIn scraping, and more — all from a simple CSV input.
The AI Research Agent is a powerful local web application designed to help businesses and researchers extract valuable information from websites. It crawls websites deeply, extracts emails and phone numbers, scrapes LinkedIn profiles, and enriches company data using third-party APIs like Hunter.io.
This tool is ideal for sales teams, lead generation agencies, market researchers, and business analysts who need fast, structured insights about companies and professionals.
Feature | Description |
---|---|
🕸️ Deep Website Crawling | Crawl every internal page of a given website |
📨 Email Extraction | Extract public emails using regex and pattern matching |
📞 Phone Number Extraction | Detect phone numbers from crawled content |
👥 LinkedIn Profile Scraping | Search and scrape relevant LinkedIn profiles |
🔍 Company Enrichment | Get company size, industry, revenue via Hunter.io API |
📄 Structured Output | Export results in clean, downloadable CSV format |
⚙️ REST API | Programmatic access via FastAPI backend |
🖥️ Web Interface | Interactive GUI using Streamlit |
Task | Success Rate |
---|---|
Email extraction from site | ~75% |
Phone number detection | ~60% |
LinkedIn profile match | ~80% |
Company enrichment (Hunter.io) | ~90% |
Average runtime per site | 10–30 seconds |
Note: Accuracy may vary based on website structure, anti-scraping measures, and data availability.
Automate prospecting by collecting contact details of decision-makers at target companies.
Gather insights into industries, company sizes, and trends without manual browsing.
Quickly build targeted lists of companies and their key contacts.
Find potential candidates and verify their current roles using LinkedIn scraping.
Research competitors or partners before entering new markets or partnerships.
- Python 3.8+
- pip package manager
pip install fastapi uvicorn streamlit googlesearch-python requests beautifulsoup4 pandas tldextract lxml scikit-learn playwright aiohttp concurrent-log-handler
Install Playwright Chromium browser:
playwright install chromium
-
Clone the project:
git clone https://github.com/cgmafia/ai-research-agent.git cd ai-research-agent
-
Start the FastAPI backend:
cd backend uvicorn main:app --reload
-
In another terminal, start the Streamlit frontend:
cd ../frontend streamlit run app.py
-
Open your browser and go to:
http://localhost:8501
website
https://techbehemoths.com
https://example-it-company.in
Company Name | Website | Emails Found | Phone Numbers | LinkedIn Profiles | Industry | Size | Annual Revenue |
---|---|---|---|---|---|---|---|
TechBehemoths | https://techbehemoths.com | sales@techbehemoths.com | +91-9876543210 | linkedin.com/in/johndoe, linkedin.com/in/janedoe | IT Services | 11-50 | N/A |
This software is intended for ethical use only. Do not scrape non-public data or violate terms of service of any website. Respect robots.txt
and privacy laws such as GDPR and CAN-SPAM.
LinkedIn scraping is limited to publicly available information only.
ai-research-agent/
│
├── backend/
│ └── main.py # FastAPI server
│
├── frontend/
│ └── app.py # Streamlit UI
│
├── data/
│ └── input.csv # Your input CSV goes here
│
├── README.md # This file
└── LICENSE # MIT License
Contributions are welcome! Whether it’s bug fixes, performance improvements, or adding support for more APIs, feel free to open an issue or submit a pull request.
Please follow the contributing guide for best practices.
For questions, feature requests, or enterprise licensing:
📧 Email: [anandvenkatarama@gmail.com]
💬 GitHub: github.com/cgmafia
This project is licensed under the MIT License – see the LICENSE file for details.
- Add Apollo.io integration for enhanced company lookup
- Implement multi-threading for faster crawling
- Build Docker image for easy deployment
- Add scheduled scraping with Google Drive auto-save
- Create Streamlit dashboard with filters and charts
- Support for proxy rotation and rotating user agents
Thank you for using the AI Research Agent. We hope this tool helps accelerate your business growth through smart automation and research.