This Flask-based web application allows users to input multiple URLs for analysis, including sentiment comparison, readability scoring, and word cloud generation. The results are presented in a user-friendly format with graphical visualizations.
Analyzes multiple URLs provided by the user. The URLs are scraped, and the text is analyzed for sentiment, readability, and word frequencies. Word clouds are also generated for each URL.
POST/analysis
Parameter | Type | Description |
---|---|---|
urls |
list[str] |
Required. List of URLs to analyze |
Field | Type | Description |
---|---|---|
sentiment_comparison |
list[dict] |
A list of sentiment analysis results for each URL, including positive, neutral, and negative sentiment scores. |
readability_comparison |
list[dict] |
A list of readability scores for each URL using the Flesch-Kincaid Grade. |
word_clouds |
list[dict] |
A list of base64 encoded word cloud images for each URL. |
invalid_urls |
list[dict] (if error) |
A list of URLs with the issues encountered (e.g., invalid format, access errors) if there are any invalid URLs. |
{
"urls": [
"https://example.com",
"https://invalid-url"
]
}
{
"sentiment_comparison": [
{
"URL": "URL 1",
"Positive": 0.12,
"Neutral": 0.75,
"Negative": 0.13
}
],
"readability_comparison": [
{
"URL": "URL 1",
"Readability": 8.5
}
],
"word_clouds": [
{
"URL": "URL 1",
"word_cloud": "<base64 image string>"
}
]
}
GET /scrape/{url}
Scrapes a single URL and returns the text content of the page.
Parameter | Type | Description |
---|---|---|
url |
string |
Required. The URL to scrape. |
Field | Type | Description |
---|---|---|
text |
string |
The text content of the URL. |
error |
string |
Any error encountered during scraping. |
Frontend:
- HTML: Structure and layout of the web pages.
- CSS: Styling and formatting the UI elements.
- JavaScript: Adds interactivity to the frontend.
Backend:
- Python: Core programming language used for backend logic.
- Flask: Web framework used to handle HTTP requests and server-side logic.
Additional Libraries:
- Requests: For making HTTP requests to scrape the URLs.
- BeautifulSoup: For parsing and extracting data from HTML content.
- ThreadPoolExecutor: For concurrent scraping of multiple URLs.
- NLTK: Used for sentiment analysis with the VADER sentiment analyzer.
- TextStat: To calculate readability scores.
- WordCloud: For generating word clouds based on word frequencies.
- Matplotlib: For rendering the word cloud images.
- CacheTools: To cache the results of scraped URLs and reduce repeated requests.
Start by cloning the project repository from GitHub:
git clone https://github.com/Tarpit59/Dynamic-Web-Content-Analyzer.git
cd Dynamic-Web-Content-Analyzer
Create a virtual environment using venv with Python 3.9.7. This ensures the project dependencies are isolated.
python3.9 -m venv venv
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
Install all required dependencies from requirements.txt:
pip install -r requirements.txt
Start the Flask development server:
python app.py
This will start the server on http://127.0.0.1:5000/ You can now visit this URL in your browser to access the app.
- On Windows:
venv\Scripts\activate
- On macOS/Linux:
source venv/bin/activate
cd base
pytest -v