A powerful Streamlit application that scrapes content from any website and generates intelligent summaries using AI. Supports both OpenAI's API and local Ollama endpoints for flexible deployment.
- Smart Web Scraping: Extracts clean content using BeautifulSoup with intelligent filtering
- AI-Powered Summarization: Generates comprehensive summaries using LLMs
- Flexible LLM Backend: Choose between OpenAI, OpenRouter, or local Ollama endpoint
- Dynamic Model Loading: Automatically fetches available models from API providers
- Caching: Built-in caching to prevent re-scraping the same URLs
- Error Handling: Robust error handling for network issues, parsing errors, and API failures
- Modern UI: Clean, responsive interface with real-time status updates
- Python 3.8 or higher
- OpenAI API key (if using OpenAI backend)
- OpenRouter API key (if using OpenRouter backend)
- Ollama installed locally (if using Ollama backend)
-
Clone the repository
git clone <repository-url> cd Website-Summarizer
-
Install dependencies
pip install -r requirements.txt
-
Configure secrets (see Configuration section below)
-
Run the application
streamlit run app.py
-
Open your browser to
http://localhost:8501
The application uses Streamlit's secrets management. Create a .streamlit/secrets.toml
file with your configuration:
OPENAI_API_KEY = "sk-your-openai-api-key-here"
OPENROUTER_API_KEY = "sk-or-your-openrouter-api-key-here"
OLLAMA_ENDPOINT_URL = "http://localhost:11434/v1"
See .streamlit/secrets.toml.example
for a complete template.
- Models: gpt-4o-mini (recommended), gpt-4o, gpt-3.5-turbo
- Setup: Get API key from OpenAI Platform
- Cost: Pay-per-use based on token consumption
- Models: Access to 200+ models including Claude, Llama, Mistral, and more
- Setup: Get API key from OpenRouter
- Cost: Pay-per-use with competitive pricing across multiple providers
- Models: llama2, mistral, codellama, etc.
- Setup: Install Ollama and pull a model
- Cost: Free (runs locally on your machine)
- Enter URL: Paste the website URL you want to summarize
- Select Backend: Choose OpenAI, OpenRouter, or Ollama from the dropdown
- Configure: Set up your API key or endpoint URL (if not in secrets)
- Generate: Click "Generate Summary" and wait for results
- Review:
- Check raw scraped content in the expandable section
- Read the AI-generated summary in the main area
- News articles:
https://www.bbc.com/news
- Documentation:
https://docs.streamlit.io
- Company websites:
https://openai.com
- Educational content:
https://www.khanacademy.org
Website-Summarizer/
βββ app.py # Main Streamlit application
βββ requirements.txt # Python dependencies
βββ README.md # This file
βββ .gitignore # Git ignore rules
βββ .streamlit/
βββ secrets.toml.example # Secrets template
βββ secrets.toml # Your secrets (not in git)
scrape_and_clean(url)
: Fetches HTML, parses with BeautifulSoup, removes noise elementssummarize_content(text, title, llm_backend, ...)
: Calls selected LLM API for summarization- Caching: Uses
@st.cache_data
to cache scraped content for 5 minutes
- Network errors: Connection timeouts, 404 errors, SSL issues
- Parsing errors: Malformed HTML, encoding issues
- API errors: Invalid keys, model not found, rate limits
- Smart caching: Prevents re-scraping identical URLs
- Session state: Preserves user inputs across interactions
- Loading indicators: Visual feedback during operations
- Responsive UI: Works on desktop and mobile devices
streamlit run app.py
- Push your code to GitHub
- Connect your repository to Streamlit Cloud
- Add your secrets in the Streamlit Cloud dashboard
- Deploy with one click
FROM python:3.9-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt
COPY . .
EXPOSE 8501
CMD ["streamlit", "run", "app.py", "--server.port=8501", "--server.address=0.0.0.0"]
- Heroku: Use the Procfile:
web: streamlit run app.py --server.port=$PORT --server.address=0.0.0.0
- Railway: Deploy directly from GitHub
- AWS/GCP/Azure: Use container services
- Never commit your API keys to version control
- Use environment variables or Streamlit secrets for sensitive data
- Consider rate limiting for production deployments
- Validate and sanitize URLs to prevent SSRF attacks
- Fork the repository
- Create a feature branch:
git checkout -b feature-name
- Commit changes:
git commit -am 'Add feature'
- Push to branch:
git push origin feature-name
- Submit a pull request
This project is licensed under the MIT License - see the LICENSE file for details.
"OPENAI_API_KEY not found in secrets"
- Add your API key to
.streamlit/secrets.toml
- Restart the Streamlit app after adding secrets
"Connection refused" (Ollama)
- Ensure Ollama is running:
ollama serve
- Check the endpoint URL in your configuration
- Verify the model is installed:
ollama list
"Parsing error"
- Some websites use JavaScript to load content
- Try a different URL or check if the site is accessible
"Network error"
- Check your internet connection
- Some websites block automated requests
- Try using a different User-Agent header
- Check the Streamlit documentation
- Review the OpenAI API documentation
- Visit the Ollama documentation
- Open an issue in this repository
Made with β€οΈ using Streamlit