This is a Streamlit-based news aggregation application that fetches news articles from multiple APIs in parallel and uses AI-powered deduplication to provide unique, relevant news content for companies. The application supports 6 different news providers and includes an AI service for intelligent article deduplication.
Preferred communication style: Simple, everyday language.
- Framework: Streamlit web application framework
- Layout: Wide layout configuration with sidebar for API key management
- Components: Interactive forms, data tables, and error handling displays
- User Interface: Clean, responsive design with expandable article containers
- Pattern: Provider-based architecture with async/await for parallel API calls
- Core Components:
- Provider layer for API integrations
- Service layer for AI processing
- Utility layer for common functions
- Concurrency: Asynchronous HTTP requests for parallel news fetching
- User inputs company name and API keys
- Multiple providers fetch news simultaneously
- Articles are normalized to common format
- AI service deduplicates articles
- Results displayed in formatted table
- Base Provider: Abstract class defining common interface and utilities
- NewsAPI Provider: Integration with NewsAPI.org
- NewsData Provider: Integration with NewsData.io
- Finlight Provider: Integration with Finlight.me financial news
- Google RSS Provider: RSS feed parsing from Google News
- Finnhub Provider: Financial news from Finnhub.io
- AlphaVantage Provider: Market news and sentiment from Alpha Vantage
- AI Service: DeepSeek AI integration for intelligent article deduplication
- Functionality: Identifies unique news events from potentially duplicate articles
- API: OpenAI-compatible client with custom base URL
- Display Utils: Streamlit UI formatting and article presentation
- Date Utils: Date parsing, formatting, and time calculations
- Input Phase: User provides company name and API keys through Streamlit interface
- Fetching Phase: All providers fetch news articles concurrently using asyncio
- Normalization Phase: Articles from different APIs are normalized to common schema
- Deduplication Phase: AI service analyzes articles to remove duplicates
- Display Phase: Unique articles presented in formatted, interactive table
- NewsAPI.org: General news articles with search capabilities
- NewsData.io: International news data service
- Finlight.me: Financial news specializing in market data
- Google RSS: Free RSS feeds from Google News
- Finnhub.io: Financial market news and data
- Alpha Vantage: Stock market news and sentiment analysis
- DeepSeek AI: Used for intelligent article deduplication
- OpenAI Client: Compatible client library for API communication
- Streamlit: Web application framework
- aiohttp: Asynchronous HTTP client for API calls
- pandas: Data manipulation and analysis
- PIL (Pillow): Image processing for article thumbnails
- xml.etree.ElementTree: XML parsing for RSS feeds
- API keys managed through environment variables and Streamlit sidebar inputs
- Flexible configuration allowing users to provide keys at runtime
- Graceful degradation when API keys are missing
- Provider-level error handling with specific error messages
- Rate limiting detection and user feedback
- Invalid API key detection and guidance
- Async/await pattern enables efficient concurrent API calls
- Modular provider architecture allows easy addition of new news sources
- AI-powered deduplication reduces information overload
- Parallel API calls reduce total fetch time
- Article limiting (10 per provider) manages response size
- Efficient data structures for article processing
Problem: Need to integrate multiple news APIs with different interfaces and authentication methods.
Solution: Abstract base provider class with concrete implementations for each API.
Benefits:
- Consistent interface across all news sources
- Easy to add new providers
- Shared utilities for date formatting and article normalization
- Independent error handling per provider
Problem: Sequential API calls would be slow and inefficient.
Solution: Async/await pattern with concurrent execution.
Benefits:
- Parallel API calls significantly reduce total fetch time
- Better user experience with faster results
- Efficient resource utilization
Problem: Multiple news sources often report the same story, creating duplicate content.
Solution: DeepSeek AI service analyzes articles for semantic similarity.
Benefits:
- Intelligent deduplication beyond simple text matching
- Focuses on unique news events rather than duplicate reports
- Improves content quality and relevance
Problem: Need rapid development of interactive web interface.
Solution: Streamlit framework with built-in components.
Benefits:
- Rapid prototyping and development
- Built-in data visualization capabilities
- Easy deployment and sharing
- Minimal frontend development required