Skip to content

Woven-Web/WeaveBot

Repository files navigation

WeaveBot - Intelligent Event Assistant πŸ€–

Python Playwright OpenAI

An intelligent Telegram bot that extracts event information from web pages using Playwright for browser automation and OpenAI GPT-4o for intelligent data extraction.

πŸš€ Key Features

  • 🌐 Universal Web Scraping: Handles JavaScript-heavy sites (Lu.ma, Meetup, etc.) with Playwright
  • 🧠 AI-Powered Extraction: Uses GPT-4o for intelligent event/update data extraction
  • πŸ“Š Airtable Integration: Automatically saves events and updates to organized tables
  • ⚑ Fast Processing: ~5-10 second response times
  • πŸ›‘οΈ Robust Error Handling: Graceful failures with helpful user feedback
  • πŸ“ˆ Weekly Summaries: Generate newsletter-style event and update summaries

πŸ—οΈ Architecture

User Input (URL) β†’ Playwright (Render Page) β†’ OpenAI (Extract Data) β†’ Airtable (Save) β†’ User Feedback

Why This Approach?

  • Playwright: Handles modern JavaScript-heavy event platforms
  • OpenAI GPT-4o: Intelligent, context-aware data extraction
  • Direct Integration: No third-party scraping services, full control
  • Cost Effective: Only OpenAI API costs (~$20-50/month typical usage)

πŸ“‹ Commands

  • /start - Welcome message and usage guide
  • /weeklyweave - Generate weekly summary of events and updates

πŸ’¬ Message Formats

Event Extraction

event: https://lu.ma/event-link
event: https://meetup.com/group/events/123456
event: https://eventbrite.com/e/event-name-123456

Update Processing

update: https://techcrunch.com/article-link
update: Just wanted to share that our meetup went great!

🌐 Supported Websites

βœ… Excellent Support

  • Lu.ma events - Full dynamic content support
  • Meetup.com - Comprehensive event details
  • News sites - TechCrunch, Wired, etc.
  • Simple event pages - Static HTML sites
  • Blog posts - Personal and corporate blogs

⚠️ Limited Support

  • Eventbrite - May be blocked due to anti-bot measures
  • Facebook Events - Requires authentication
  • LinkedIn Events - Anti-scraping protection

πŸ”§ Environment Variables

Required

TELEGRAM_BOT_TOKEN=your_telegram_bot_token
OPENAI_API_KEY=your_openai_api_key
AIRTABLE_API_KEY=your_airtable_api_key
AIRTABLE_BASE_ID=your_airtable_base_id
AIRTABLE_TABLE_NAME=Events

Optional

AIRTABLE_TABLE_ID=optional_events_table_id
AIRTABLE_VIEW_ID=optional_events_view_id
AIRTABLE_UPDATES_TABLE_NAME=Updates
AIRTABLE_UPDATES_TABLE_ID=optional_updates_table_id
AIRTABLE_UPDATES_VIEW_ID=optional_updates_view_id

πŸš€ Deployment

Option 1: Render (Recommended)

  1. Fork this repository
  2. Connect to Render
  3. Set environment variables
  4. Deploy as Worker service

Option 2: Docker

# Build image
docker build -t weavebot .

# Run container
docker run -d \
  --name weavebot \
  -e TELEGRAM_BOT_TOKEN=your_token \
  -e OPENAI_API_KEY=your_key \
  -e AIRTABLE_API_KEY=your_key \
  -e AIRTABLE_BASE_ID=your_base_id \
  -e AIRTABLE_TABLE_NAME=Events \
  weavebot

Option 3: Local Development

# Install dependencies
pip install -r requirements.txt

# Install Playwright browsers
playwright install chromium

# Set environment variables in .env file
cp .env.example .env
# Edit .env with your keys

# Run the bot
python bot.py

πŸ“Š Data Structure

Events Table (Airtable)

  • Event Title (Text)
  • Description (Long Text)
  • Start Datetime (Date/Time)
  • End Datetime (Date/Time)
  • Location (Text)
  • Link (URL)

Updates Table (Airtable)

  • Content (Long Text)
  • Received At (Date/Time - auto-generated)

πŸƒβ€β™‚οΈ Performance

  • Cold Start: ~5-10 seconds
  • Warm Processing: ~3-5 seconds
  • Memory Usage: ~150-200MB
  • Browser Overhead: Minimal (headless Chromium)

πŸ”„ Migration from ScrapeGraphAI

This version removes ScrapeGraphAI in favor of a cleaner architecture:

Before (Issues)

  • Complex setup with multiple dependencies
  • ScrapeGraphAI reliability issues
  • Credit-based pricing confusion
  • Performance overhead

After (Benefits)

  • Direct Playwright + OpenAI integration
  • Predictable OpenAI-only costs
  • Better error handling and logging
  • Faster processing times

πŸ§ͺ Testing

WeaveBot includes a comprehensive test suite with 22 tests covering all functionality:

Quick Testing

# Run all tests
python3 run_tests.py all

# Run only unit tests (fast)
python3 run_tests.py unit

# Run with coverage report
python3 run_tests.py coverage

Test Coverage

  • βœ… Date validation and formatting
  • βœ… OpenAI data extraction with mocking
  • βœ… Playwright browser automation
  • βœ… Airtable integration and data mapping
  • βœ… Newsletter generation and formatting
  • βœ… End-to-end workflow testing
  • βœ… Comprehensive error handling

CI/CD

  • GitHub Actions: Automated testing on push/PR
  • Multiple Python versions: 3.9, 3.10, 3.11
  • Code quality: Linting with flake8, black, isort
  • Coverage reporting: Integrated with Codecov

See Testing Guide for detailed documentation.

πŸ› οΈ Development

Project Structure

WeaveBot/
β”œβ”€β”€ bot.py              # Main bot logic
β”œβ”€β”€ test_bot.py         # Comprehensive test suite
β”œβ”€β”€ run_tests.py        # Test runner script
β”œβ”€β”€ pytest.ini         # Test configuration
β”œβ”€β”€ requirements.txt    # Python dependencies
β”œβ”€β”€ Dockerfile         # Container configuration
β”œβ”€β”€ render.yaml        # Render deployment config
β”œβ”€β”€ docs/              # Documentation
β”‚   β”œβ”€β”€ testing.md     # Testing guide
β”‚   └── python-revert-analysis.md
└── README.md          # This file

Key Components

  • Event Processing: scrape_event_data() + extract_event_data_with_openai()
  • Update Processing: scrape_update_data() + extract_update_data_with_openai()
  • Browser Automation: get_html_with_playwright()
  • Data Storage: save_event_to_airtable() + save_update_to_airtable()

πŸ› Troubleshooting

Common Issues

Bot not responding

  • Check Telegram bot token
  • Verify internet connectivity
  • Check logs for error messages

Scraping failures

  • Some sites block automated access
  • Try different event platforms (Lu.ma, Meetup)
  • Check if URL is accessible manually

Airtable errors

  • Verify API key and base ID
  • Check table names match exactly
  • Ensure required fields exist in tables

Logging

The bot provides detailed logging for debugging:

# View logs in production
docker logs weavebot

# Local development
python bot.py  # Logs print to console

πŸ“ˆ Usage Analytics

Track your bot usage:

  • Successful Events: Check Airtable Events table
  • Updates Processed: Check Airtable Updates table
  • Error Rates: Monitor application logs
  • Response Times: Built-in timing logs

🀝 Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Test thoroughly
  5. Submit a pull request

πŸ“„ License

MIT License - see LICENSE file for details

πŸ™‹β€β™‚οΈ Support

For issues or questions:

  1. Check the troubleshooting section
  2. Review application logs
  3. Open a GitHub issue with details

Built with ❀️ using Python, Playwright, and OpenAI GPT-4o

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •