CyberFeedBites: Lightweight Cybersecurity RSS Reader

CyberFeedBites is a lightweight Python tool that provides an overview of recent cybersecurity news by aggregating multiple RSS feeds. It includes a curated, customisable OPML file of cybersecurity news sources.

Features

Collects and processes cybersecurity news from various RSS feeds (curated list already included in an OPML file).
Fetches cybersecurity news from the past N days (default: 7 days).
Generates HTML, JSON and CSV files with news titles, brief descriptions, and links to articles, sorted by date, providing an overview of the latest cybersecurity news.
Includes a customisable OPML file with a list of notable cybersecurity RSS sources.
Supports asynchronous HTTP fetching using aiohttp for improved performance.

Quick Start

CyberFeedBites is ready to use once dependencies are installed. To run it as is, follow these steps to fetch the latest 24-hour articles from the sources listed in the provided cybersecnews-sources.opml file and generate a summary as HTML, JSON and CSV files saved in the data/html_reports/ and data/csv_reports/ directories:

Clone the repository: git clone https://github.com/dsgandurra/cyberfeedbites.git
Move to the root folder: cd cyberfeedbites
Install dependencies: pip install -r requirements.txt
Run CyberFeedBites: python src/main.py
Check the generated HTML file in the data/html_reports/ directory.

Installation

Clone this repository:

git clone https://github.com/dsgandurra/cyberfeedbites.git
cd cyberfeedbites

Before running CyberFeedBites, it's recommended to use a virtual environment to keep your project's dependencies isolated. To do so, create and activate a virtual environment:

On Linux/macOS:
```
python3 -m venv myenv
source myenv/bin/activate
```
On Windows:
```
python -m venv myenv
.\myenv\Scripts\activate
```
Install the required dependencies:
```
pip install -r requirements.txt
```

The required packages are:

feedparser: For parsing RSS feeds.
beautifulsoup4: For HTML parsing and manipulation.
aiohttp: For asynchronous HTTP requests to speed up feed fetching.
requests: For robust HTTP fetching with custom headers.
tldextract — extracting domain names from URLs

Usage

Navigate to the root folder of the project:
```
cd path/to/cyberfeedbites
```

Run CyberFeedBites with optional parameters:

python src/main.py [--start <start_days_ago>] [--end <end_days_ago>] [--opml <opml_file_path>] [--output-format <formats>] [--output-html-folder <folder>] [--output-csv-folder <folder>] [--output-json-folder <folder>] [--align-start-to-midnight] [--align-end-to-midnight] [--html-img] [--max-length-description <length>] [--exclude-keywords] [--exclude-keywords-file <file>] [--print-retrieved] [--print-skipped] [--order-by {date,title_date}]

--start: Number of days ago to start fetching news (default: 1).
--end: Number of days ago to end fetching news (default: 0).
--opml: Path to the OPML file to use (default as configured).
--output-format: Comma-separated list of output formats (html, csv, json). Default is all.
--output-html-folder: Folder for HTML output (default configured).
--output-csv-folder: Folder for CSV output (default configured).
--output-json-folder: Folder for JSON output (default configured).
--align-start-to-midnight: Align start date to midnight.
--align-end-to-midnight: Align end date to 23:59:59.
--max-length-description: Maximum length for RSS feed descriptions (default: 200).
--exclude-keywords: Enable exclusion of entries containing specific keywords.
--exclude-keywords-file: Path to a file containing keywords to exclude, one per line. Overrides default keywords.
--print-retrieved: Print retrieved articles summary at the end.
--print-skipped: Print skipped articles summary at the end.
--order-by: Order HTML output by date (default) or title_date.
--html-img: Include images in the HTML output (default is False).
--aggressive-filtering: Enable removal of articles that do NOT include any security keywords. Default is False.
--aggressive-keywords-file: Path to a file containing security keywords to keep, one per line. Overrides default cybersecurity keywords.

Examples:

Fetch news from the last 1 day (default):
```
python src/main.py
```
Fetch news from 5 to 2 days ago:
```
python src/main.py --start 5 --end 2
```
Generate only CSV output:
```
python src/main.py --output-format csv
```

Exclude entries containing keywords (using default or file):

python src/main.py --exclude-keywords --exclude-keywords-file exclude.txt

Output

The resulting HTML, JSON and CSV files, which list the news from the past 'X' days, will be saved in the data/html_reports, data/json_reports, and data/csv_reports folders, respectively. The filenames will be in the following format:

_YYYY-MM-DD_HH-MM-SS.html
_YYYY-MM-DD_HH-MM-SS.json
_YYYY-MM-DD_HH-MM-SS.csv

Where is derived from the category attribute of the top-level <outline> element in the OPML file (with special characters removed and all letters converted to lowercase). If that attribute is missing, the text attribute will be used instead. If both are missing, an error will be raised.

In the provided OPML file, the top-level <outline> element contains category="Cybersecurity News", so the resulting filename will begin with cybersecuritynews.

Each HTML file contains a table with the following columns:

ID: The article's position in the list.
Date: The date the article was published.
Website: The name of the website and logo of the channel (if available or if stored in the OPML file).
Title: The title of the article.
Description: A brief description of the article.
Link: The URL to the full article.

Each JSON file has the following format:

title: The title of the article.
link: The URL to the full article.
published: The date the article was published.
source: The website name of the source.
description: A brief description of the article.

Each CSV file contains a table with the following columns:

Date: The date the article was published.
Website: The name of the website.
Title: The title of the article.
Description: A brief description of the article.
Link: The URL to the full article.

Sample OPML File

The repository includes a sample OPML file (data/rss_sources/cybersecnews-sources.opml) containing a curated list of relevant RSS sources. You can edit this file to add or remove RSS feed URLs according to your preferences or use a custom OPML file by passing its path via the --opml option.

OPML File Structure

CyberFeedBites expects each OPML file to include only one top-level <outline> element (i.e., one section or feed group, e.g., <outline text="Cybersecurity News" title="Cybersecurity News Feeds" category="Cybersecurity News">). Based on this, the program generates report files (HTML, JSON, CSV) using the top-level outline's category attribute (or text, if missing) as the filename prefix. If multiple top-level outlines are present in one OPML file, only the first will be processed; others will be ignored. Hence, to organise feeds into multiple categories or sections, split them into separate OPML files—one per category—and run CyberFeedBites separately for each. This keeps the program simple and avoids handling multiple output files or combined reports, while allowing different look-back periods for different categories or sections.

Customisation

The MAX_DAYS_BACK parameter in config.py limits the maximum number of days allowed as input to prevent fetching an excessive number of RSS entries. You can increase this limit by editing the variable in config.py.
You can also add or remove RSS feed sources by editing the cybersecnews-sources.opml file. Alternatively, you can use a different OPML file as parameter, allowing multiple feed sets to be maintained separately.
CyberFeedBites generates HTML output based on a template located in the data/templates folder. The template also includes sorttable.js to enable sortable tables. You can modify template.html and the accompanying style.css for customisation. If you choose a different directory to store the HTML reports, ensure the style CSS and sorttable.js references are updated accordingly.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
data		data
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CyberFeedBites: Lightweight Cybersecurity RSS Reader

Features

Quick Start

Installation

Usage

Output

Sample OPML File

OPML File Structure

Customisation

License

About

Uh oh!

Uh oh!

Languages

License

dsgandurra/cyberfeedbites

Folders and files

Latest commit

History

Repository files navigation

CyberFeedBites: Lightweight Cybersecurity RSS Reader

Features

Quick Start

Installation

Usage

Output

Sample OPML File

OPML File Structure

Customisation

License

About

Topics

Resources

License

Security policy

Uh oh!

Stars

Watchers

Forks

Uh oh!

Languages