An intelligent Telegram bot that extracts event information from web pages using Playwright for browser automation and OpenAI GPT-4o for intelligent data extraction.
- π Universal Web Scraping: Handles JavaScript-heavy sites (Lu.ma, Meetup, etc.) with Playwright
- π§ AI-Powered Extraction: Uses GPT-4o for intelligent event/update data extraction
- π Airtable Integration: Automatically saves events and updates to organized tables
- β‘ Fast Processing: ~5-10 second response times
- π‘οΈ Robust Error Handling: Graceful failures with helpful user feedback
- π Weekly Summaries: Generate newsletter-style event and update summaries
User Input (URL) β Playwright (Render Page) β OpenAI (Extract Data) β Airtable (Save) β User Feedback
- Playwright: Handles modern JavaScript-heavy event platforms
- OpenAI GPT-4o: Intelligent, context-aware data extraction
- Direct Integration: No third-party scraping services, full control
- Cost Effective: Only OpenAI API costs (~$20-50/month typical usage)
/start
- Welcome message and usage guide/weeklyweave
- Generate weekly summary of events and updates
event: https://lu.ma/event-link
event: https://meetup.com/group/events/123456
event: https://eventbrite.com/e/event-name-123456
update: https://techcrunch.com/article-link
update: Just wanted to share that our meetup went great!
- Lu.ma events - Full dynamic content support
- Meetup.com - Comprehensive event details
- News sites - TechCrunch, Wired, etc.
- Simple event pages - Static HTML sites
- Blog posts - Personal and corporate blogs
- Eventbrite - May be blocked due to anti-bot measures
- Facebook Events - Requires authentication
- LinkedIn Events - Anti-scraping protection
TELEGRAM_BOT_TOKEN=your_telegram_bot_token
OPENAI_API_KEY=your_openai_api_key
AIRTABLE_API_KEY=your_airtable_api_key
AIRTABLE_BASE_ID=your_airtable_base_id
AIRTABLE_TABLE_NAME=Events
AIRTABLE_TABLE_ID=optional_events_table_id
AIRTABLE_VIEW_ID=optional_events_view_id
AIRTABLE_UPDATES_TABLE_NAME=Updates
AIRTABLE_UPDATES_TABLE_ID=optional_updates_table_id
AIRTABLE_UPDATES_VIEW_ID=optional_updates_view_id
- Fork this repository
- Connect to Render
- Set environment variables
- Deploy as Worker service
# Build image
docker build -t weavebot .
# Run container
docker run -d \
--name weavebot \
-e TELEGRAM_BOT_TOKEN=your_token \
-e OPENAI_API_KEY=your_key \
-e AIRTABLE_API_KEY=your_key \
-e AIRTABLE_BASE_ID=your_base_id \
-e AIRTABLE_TABLE_NAME=Events \
weavebot
# Install dependencies
pip install -r requirements.txt
# Install Playwright browsers
playwright install chromium
# Set environment variables in .env file
cp .env.example .env
# Edit .env with your keys
# Run the bot
python bot.py
- Event Title (Text)
- Description (Long Text)
- Start Datetime (Date/Time)
- End Datetime (Date/Time)
- Location (Text)
- Link (URL)
- Content (Long Text)
- Received At (Date/Time - auto-generated)
- Cold Start: ~5-10 seconds
- Warm Processing: ~3-5 seconds
- Memory Usage: ~150-200MB
- Browser Overhead: Minimal (headless Chromium)
This version removes ScrapeGraphAI in favor of a cleaner architecture:
- Complex setup with multiple dependencies
- ScrapeGraphAI reliability issues
- Credit-based pricing confusion
- Performance overhead
- Direct Playwright + OpenAI integration
- Predictable OpenAI-only costs
- Better error handling and logging
- Faster processing times
WeaveBot includes a comprehensive test suite with 22 tests covering all functionality:
# Run all tests
python3 run_tests.py all
# Run only unit tests (fast)
python3 run_tests.py unit
# Run with coverage report
python3 run_tests.py coverage
- β Date validation and formatting
- β OpenAI data extraction with mocking
- β Playwright browser automation
- β Airtable integration and data mapping
- β Newsletter generation and formatting
- β End-to-end workflow testing
- β Comprehensive error handling
- GitHub Actions: Automated testing on push/PR
- Multiple Python versions: 3.9, 3.10, 3.11
- Code quality: Linting with flake8, black, isort
- Coverage reporting: Integrated with Codecov
See Testing Guide for detailed documentation.
WeaveBot/
βββ bot.py # Main bot logic
βββ test_bot.py # Comprehensive test suite
βββ run_tests.py # Test runner script
βββ pytest.ini # Test configuration
βββ requirements.txt # Python dependencies
βββ Dockerfile # Container configuration
βββ render.yaml # Render deployment config
βββ docs/ # Documentation
β βββ testing.md # Testing guide
β βββ python-revert-analysis.md
βββ README.md # This file
- Event Processing:
scrape_event_data()
+extract_event_data_with_openai()
- Update Processing:
scrape_update_data()
+extract_update_data_with_openai()
- Browser Automation:
get_html_with_playwright()
- Data Storage:
save_event_to_airtable()
+save_update_to_airtable()
Bot not responding
- Check Telegram bot token
- Verify internet connectivity
- Check logs for error messages
Scraping failures
- Some sites block automated access
- Try different event platforms (Lu.ma, Meetup)
- Check if URL is accessible manually
Airtable errors
- Verify API key and base ID
- Check table names match exactly
- Ensure required fields exist in tables
The bot provides detailed logging for debugging:
# View logs in production
docker logs weavebot
# Local development
python bot.py # Logs print to console
Track your bot usage:
- Successful Events: Check Airtable Events table
- Updates Processed: Check Airtable Updates table
- Error Rates: Monitor application logs
- Response Times: Built-in timing logs
- Fork the repository
- Create a feature branch
- Make your changes
- Test thoroughly
- Submit a pull request
MIT License - see LICENSE file for details
For issues or questions:
- Check the troubleshooting section
- Review application logs
- Open a GitHub issue with details
Built with β€οΈ using Python, Playwright, and OpenAI GPT-4o