Skip to content

This Python script automates the extraction of news articles. It uses Robocorp and Selenium to scrape news websites with ease. You can adjust search settings to find specific news items. The script is designed to handle errors gracefully, with clean and understandable code, and includes logging for smooth RPA development

License

Notifications You must be signed in to change notification settings

nathyBekele/news-site-rpa-bot

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

News Site RPA Bot

Overview

This project is an automated bot designed to gather news articles from a specific website and store the relevant information in an Excel file. It allows users to specify a search topic and a target month for retrieving articles.

Features

  • Automated web scraping to collect news articles from a target website.
  • User-defined search topic and month for filtering articles.
  • Extraction of article metadata including title, date, author, description, and picture link.
  • Analysis functions to count occurrences of search phrases in the article descriptions and detect the presence of money-related information.

File Hierarchy

  • .env: Configuration file for environment variables.
  • .gitignore: File specifying which files and directories to ignore in Git.
  • README.md: Project documentation.
  • robot.yaml: Robocorp project configuration file.
  • tasks.py: Main Python script containing task definitions.
  • conda.yaml: Conda environment configuration file.
  • data/: Directory for data files.
    • News.xlsx: Excel file for storing news data.
    • input.json: JSON file containing input variables.
  • src/: Source code directory.
    • __init__.py: Python package initializer.
    • news_article.py: Module for NewsArticle class.
    • news_scraper.py: Main module for NewsScraper class.

Usage

  1. Ensure all dependencies are installed and configured.
  2. Update the input.json file located in the data folder with the desired search topic, news timeframe, and category.
  3. Run the tasks.py script to initiate the bot.
  4. Extracted article data will be stored in the News.xlsx Excel file located in the data folder.

Requirements

  • Python 3.x
  • robocorp-browser
  • rpaframework

About

This Python script automates the extraction of news articles. It uses Robocorp and Selenium to scrape news websites with ease. You can adjust search settings to find specific news items. The script is designed to handle errors gracefully, with clean and understandable code, and includes logging for smooth RPA development

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages