Dynamic Web Content Analyzer

This Flask-based web application allows users to input multiple URLs for analysis, including sentiment comparison, readability scoring, and word cloud generation. The results are presented in a user-friendly format with graphical visualizations.

API Reference

Analyze Multiple URLs

Description:

Analyzes multiple URLs provided by the user. The URLs are scraped, and the text is analyzed for sentiment, readability, and word frequencies. Word clouds are also generated for each URL.

  POST/analysis

Request Body:

Parameter	Type	Description
`urls`	`list[str]`	Required. List of URLs to analyze

Response Body:

Field	Type	Description
`sentiment_comparison`	`list[dict]`	A list of sentiment analysis results for each URL, including positive, neutral, and negative sentiment scores.
`readability_comparison`	`list[dict]`	A list of readability scores for each URL using the Flesch-Kincaid Grade.
`word_clouds`	`list[dict]`	A list of base64 encoded word cloud images for each URL.
`invalid_urls`	`list[dict] (if error)`	A list of URLs with the issues encountered (e.g., invalid format, access errors) if there are any invalid URLs.

Example Request:

{
  "urls": [
    "https://example.com",
    "https://invalid-url"
  ]
}

Example Response:

{
  "sentiment_comparison": [
    {
      "URL": "URL 1",
      "Positive": 0.12,
      "Neutral": 0.75,
      "Negative": 0.13
    }
  ],
  "readability_comparison": [
    {
      "URL": "URL 1",
      "Readability": 8.5
    }
  ],
  "word_clouds": [
    {
      "URL": "URL 1",
      "word_cloud": "<base64 image string>"
    }
  ]
}

Scrape Page

  GET /scrape/{url}

Description:

Scrapes a single URL and returns the text content of the page.

Parameters:

Parameter	Type	Description
`url`	`string`	Required. The URL to scrape.

Response:

Field	Type	Description
`text`	`string`	The text content of the URL.
`error`	`string`	Any error encountered during scraping.

Tech Stack

Frontend:

HTML: Structure and layout of the web pages.
CSS: Styling and formatting the UI elements.
JavaScript: Adds interactivity to the frontend.

Backend:

Python: Core programming language used for backend logic.
Flask: Web framework used to handle HTTP requests and server-side logic.

Additional Libraries:

Requests: For making HTTP requests to scrape the URLs.
BeautifulSoup: For parsing and extracting data from HTML content.
ThreadPoolExecutor: For concurrent scraping of multiple URLs.
NLTK: Used for sentiment analysis with the VADER sentiment analyzer.
TextStat: To calculate readability scores.
WordCloud: For generating word clouds based on word frequencies.
Matplotlib: For rendering the word cloud images.
CacheTools: To cache the results of scraped URLs and reduce repeated requests.

Implementation

1. Clone the repository

Start by cloning the project repository from GitHub:

git clone https://github.com/Tarpit59/Dynamic-Web-Content-Analyzer.git
cd Dynamic-Web-Content-Analyzer

2. Set up a virtual environment:

Create a virtual environment using venv with Python 3.9.7. This ensures the project dependencies are isolated.

python3.9 -m venv venv

3. Activate the virtual environment:

On Windows:

venv\Scripts\activate

On macOS/Linux:

source venv/bin/activate

4. Install dependencies:

Install all required dependencies from requirements.txt:

pip install -r requirements.txt

5. Run the Flask Server:

Start the Flask development server:

python app.py

This will start the server on http://127.0.0.1:5000/ You can now visit this URL in your browser to access the app.

Running Tests

Documentation

1. Activate the virtual environment:

On Windows:

venv\Scripts\activate

On macOS/Linux:

source venv/bin/activate

2. Go into base Direcrory:

  cd base

3. Run the following command:

  pytest -v

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
base		base
.gitignore		.gitignore
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dynamic Web Content Analyzer

API Reference

Analyze Multiple URLs

Description:

Request Body:

Response Body:

Example Request:

Example Response:

Scrape Page

Description:

Parameters:

Response:

Tech Stack

Implementation

1. Clone the repository

2. Set up a virtual environment:

3. Activate the virtual environment:

4. Install dependencies:

5. Run the Flask Server:

Running Tests

1. Activate the virtual environment:

2. Go into base Direcrory:

3. Run the following command:

Acknowledgements

Screenshots

🔗 Links

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Tarpit59/Dynamic-Web-Content-Analyzer

Folders and files

Latest commit

History

Repository files navigation

Dynamic Web Content Analyzer

API Reference

Analyze Multiple URLs

Description:

Request Body:

Response Body:

Example Request:

Example Response:

Scrape Page

Description:

Parameters:

Response:

Tech Stack

Implementation

1. Clone the repository

2. Set up a virtual environment:

3. Activate the virtual environment:

4. Install dependencies:

5. Run the Flask Server:

Running Tests

1. Activate the virtual environment:

2. Go into base Direcrory:

3. Run the following command:

Acknowledgements

Screenshots

🔗 Links

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages