Skip to content

The AI Research Agent is a powerful local web application designed to help businesses and researchers extract valuable information from websites. It crawls websites deeply, extracts emails and phone numbers, scrapes LinkedIn profiles, and enriches company data using third-party APIs like Hunter.io.

Notifications You must be signed in to change notification settings

cgmafia/ai-research-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🤖 AI Research Agent – Web App with API

Python FastAPI Streamlit License

🔍 Automate company research, email discovery, LinkedIn scraping, and more — all from a simple CSV input.


📌 Overview

The AI Research Agent is a powerful local web application designed to help businesses and researchers extract valuable information from websites. It crawls websites deeply, extracts emails and phone numbers, scrapes LinkedIn profiles, and enriches company data using third-party APIs like Hunter.io.

This tool is ideal for sales teams, lead generation agencies, market researchers, and business analysts who need fast, structured insights about companies and professionals.


🚀 Features

Feature Description
🕸️ Deep Website Crawling Crawl every internal page of a given website
📨 Email Extraction Extract public emails using regex and pattern matching
📞 Phone Number Extraction Detect phone numbers from crawled content
👥 LinkedIn Profile Scraping Search and scrape relevant LinkedIn profiles
🔍 Company Enrichment Get company size, industry, revenue via Hunter.io API
📄 Structured Output Export results in clean, downloadable CSV format
⚙️ REST API Programmatic access via FastAPI backend
🖥️ Web Interface Interactive GUI using Streamlit

📊 Success Rate & Stats (Approx.)

Task Success Rate
Email extraction from site ~75%
Phone number detection ~60%
LinkedIn profile match ~80%
Company enrichment (Hunter.io) ~90%
Average runtime per site 10–30 seconds

Note: Accuracy may vary based on website structure, anti-scraping measures, and data availability.


💼 Business Scope & Use Cases

Who Can Benefit?

🏢 Sales Teams

Automate prospecting by collecting contact details of decision-makers at target companies.

📊 Market Researchers

Gather insights into industries, company sizes, and trends without manual browsing.

📣 Lead Generation Agencies

Quickly build targeted lists of companies and their key contacts.

🧑‍💼 Recruiters

Find potential candidates and verify their current roles using LinkedIn scraping.

📈 Entrepreneurs

Research competitors or partners before entering new markets or partnerships.


📦 Installation

Prerequisites

  • Python 3.8+
  • pip package manager

Install Dependencies

pip install fastapi uvicorn streamlit googlesearch-python requests beautifulsoup4 pandas tldextract lxml scikit-learn playwright aiohttp concurrent-log-handler

Install Playwright Chromium browser:

playwright install chromium

🧪 How to Run Locally

  1. Clone the project:

    git clone https://github.com/cgmafia/ai-research-agent.git
    cd ai-research-agent
  2. Start the FastAPI backend:

    cd backend
    uvicorn main:app --reload
  3. In another terminal, start the Streamlit frontend:

    cd ../frontend
    streamlit run app.py
  4. Open your browser and go to:

    http://localhost:8501
    

🧾 Sample Input (input.csv)

website
https://techbehemoths.com
https://example-it-company.in

📤 Sample Output (output_companies_contacts.csv)

Company Name Website Emails Found Phone Numbers LinkedIn Profiles Industry Size Annual Revenue
TechBehemoths https://techbehemoths.com sales@techbehemoths.com +91-9876543210 linkedin.com/in/johndoe, linkedin.com/in/janedoe IT Services 11-50 N/A

🔐 Legal Disclaimer

This software is intended for ethical use only. Do not scrape non-public data or violate terms of service of any website. Respect robots.txt and privacy laws such as GDPR and CAN-SPAM.

LinkedIn scraping is limited to publicly available information only.


📁 Folder Structure

ai-research-agent/
│
├── backend/
│   └── main.py         # FastAPI server
│
├── frontend/
│   └── app.py          # Streamlit UI
│
├── data/
│   └── input.csv       # Your input CSV goes here
│
├── README.md           # This file
└── LICENSE             # MIT License

🤝 Contributing

Contributions are welcome! Whether it’s bug fixes, performance improvements, or adding support for more APIs, feel free to open an issue or submit a pull request.

Please follow the contributing guide for best practices.


📬 Contact

For questions, feature requests, or enterprise licensing:

📧 Email: [anandvenkatarama@gmail.com]
💬 GitHub: github.com/cgmafia


📜 License

This project is licensed under the MIT License – see the LICENSE file for details.


🚀 Future Roadmap

  • Add Apollo.io integration for enhanced company lookup
  • Implement multi-threading for faster crawling
  • Build Docker image for easy deployment
  • Add scheduled scraping with Google Drive auto-save
  • Create Streamlit dashboard with filters and charts
  • Support for proxy rotation and rotating user agents

Thank you for using the AI Research Agent. We hope this tool helps accelerate your business growth through smart automation and research.

About

The AI Research Agent is a powerful local web application designed to help businesses and researchers extract valuable information from websites. It crawls websites deeply, extracts emails and phone numbers, scrapes LinkedIn profiles, and enriches company data using third-party APIs like Hunter.io.

Topics

Resources

Stars

Watchers

Forks

Languages