Skip to content

Scraper for Google News articles with headline extraction, keyword targeting, and proxy support.

License

Notifications You must be signed in to change notification settings

exbomkimaa/Google-News-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Google News Scraper 📰

GitHub Repo Python License

Overview

Welcome to the Google News Scraper repository! This project allows you to extract articles from Google News efficiently. With features like headline extraction, keyword targeting, and proxy support, you can gather news articles tailored to your needs.

If you want to dive right in, download the latest version from the Releases section and follow the instructions below.

Features

  • Headline Extraction: Capture the main headlines from news articles.
  • Keyword Targeting: Focus on specific topics or keywords to filter your results.
  • Proxy Support: Rotate proxies to avoid detection and enhance scraping efficiency.
  • Data Extraction: Utilize Beautiful Soup and Requests for effective data handling.
  • Headless Scraping: Use headless browsers to scrape dynamic content.

Table of Contents

  1. Installation
  2. Usage
  3. Configuration
  4. Example
  5. Contributing
  6. License
  7. Support

Installation

To get started with the Google News Scraper, you need to install the required libraries. You can do this using pip:

pip install beautifulsoup4 requests

Make sure you have Python 3.6 or higher installed. You can check your Python version with:

python --version

After installing the required libraries, download the latest release from the Releases section. Extract the files and navigate to the project directory.

Usage

Once you have everything set up, you can start using the scraper. Here’s a simple example of how to run the script:

python google_news_scraper.py

This command will initiate the scraping process and output the results to your console.

Command Line Arguments

You can customize your scraping process using command line arguments:

  • --keywords: Specify keywords to filter articles.
  • --proxy: Provide a proxy address for scraping.
  • --output: Define the output file for saving results.

Example:

python google_news_scraper.py --keywords "technology" --proxy "http://your.proxy:port" --output results.json

Configuration

You can configure the scraper settings in the config.py file. This includes:

  • Default keywords
  • Proxy settings
  • Output formats

Make sure to adjust these settings to match your requirements.

Example

Here’s a simple example of how the scraper works:

  1. Run the Scraper: Execute the scraper with your desired parameters.
  2. Results: The scraper will fetch articles based on your keywords and output them in the specified format.

Sample Output

[
    {
        "headline": "Latest Advances in AI Technology",
        "link": "https://news.example.com/latest-advances-in-ai",
        "date": "2023-10-01"
    },
    {
        "headline": "Tech Giants Collaborate for Sustainable Solutions",
        "link": "https://news.example.com/tech-giants-sustainable-solutions",
        "date": "2023-10-02"
    }
]

Contributing

We welcome contributions to enhance the Google News Scraper. Here’s how you can help:

  1. Fork the repository: Create your own copy of the project.
  2. Create a branch: Use a descriptive name for your branch.
  3. Make your changes: Implement your features or fixes.
  4. Submit a pull request: Share your changes with the community.

Please ensure your code adheres to the existing style and includes appropriate tests.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Support

If you encounter any issues or have questions, please check the Releases section for updates. You can also create an issue in the repository for any bugs or feature requests.

Acknowledgments

  • Beautiful Soup: For parsing HTML and XML documents.
  • Requests: For making HTTP requests easily.
  • GitHub: For hosting and managing the project.

Conclusion

The Google News Scraper is a powerful tool for anyone looking to gather news articles from Google News. With its easy-to-use interface and robust features, you can quickly access the information you need.

For more information, visit the Releases section to download the latest version and start scraping today!

Packages

No packages published

Contributors 2

  •  
  •  

Languages