Autonomous-AI-Agent-

This project is an automated pipeline that performs browser-driven web searches, extracts relevant content, summarizes it using AI, and generates a clean PDF report.

key compomets used :

Browser Automation: Used Playwright to automate the browser
Information Extraction: Scrapy is used to visit the url and scrape the data from the relevant website
Summarization Engine: Converts raw content into concise summaries using a local/external summarizer.
Keyword Extraction: Identifies core keywords for each source using KeyBERT/spaCy.
PDF Generation: Formats scraped and summarized content into a well-structured PDF (no Unicode errors, no emojis, images + tables supported).
File System Execution Layer: Manages file creation, reading, and organization from the pipeline.

Folder Structure

search_scrape_summarize/
├── search_urls.json          # URL list extracted from search
├── scraped_output.json       # Raw scraped + summarized output
├── final_summary.json        # Cleaned and structured final output
├── file_system_handler.py    # PDF generation + file ops
├── keyword_extractor.py      # Extracts keywords from summary
├── search_scrape_summarize.py# Main driver pipeline
├── summarizer.py             # Summary logic (external/local)
├── scrapy_project/
│   └── spiders/
│       └── search_spider.py  # Scrapy spider for URL crawling

tools used

Purpose	Tool/Library
Browser Automation	`Playwright`
Web Scraping	`Scrapy`, `BeautifulSoup`
Summarization	External or Local LLM
Keyword Extraction	`KeyBERT`, `spaCy`
PDF Report Generator	`fpdf`

Working (End-to-End)

Instruction Parsing: Receives a command like:
“Search smartphone reviews, extract pros/cons, generate PDF.”
Intent Detection: Classifier tags the instruction as:
web + summarization + file_handler
Pipeline Triggered:
- Browser_automation.py handles the flow
- Scrapes URLs via search_spider.py
- Extracts + summarizes content
- Stores structured output in JSON
- Generates PDF via file_system_handler.py
PDF Output:
- Title
- Source URLs
- Summaries
- Pros/Cons

# Activate virtualenv
source venv/bin/activate  # or venv\Scripts\activate

# to run
python main.py

# Output PDF will be at:
# final_report.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.gitignore		.gitignore
Browser_automation.py		Browser_automation.py
README.md		README.md
content_scraper.py		content_scraper.py
file_system_handler.py		file_system_handler.py
instruction_parser.py		instruction_parser.py
keyword_extractor.py		keyword_extractor.py
main.py		main.py
search_spider.py		search_spider.py
summarizer.py		summarizer.py
task_handler.py		task_handler.py
terminal_execution.py		terminal_execution.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Autonomous-AI-Agent-

Folder Structure

tools used

Working (End-to-End)

About

Uh oh!

Releases

Packages

Languages

Gouthamjs15/Autonomous-AI-Agent-

Folders and files

Latest commit

History

Repository files navigation

Autonomous-AI-Agent-

Folder Structure

tools used

Working (End-to-End)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages