A powerful and modular Command-Line Interface (CLI) web crawler built in Node.js, developed using a Test-Driven Development (TDD) approach. This application is designed for efficient and customizable web scraping tasks.
- Command-Line Interface (CLI): Interact with the crawler directly from your terminal for seamless integration into scripts and workflows.
- Test-Driven Development (TDD): Developed with a focus on reliability and maintainability, ensuring robust performance through comprehensive testing.
- Modular Architecture: Easily extend and customize functionalities to suit various web scraping needs.
- Asynchronous Operations: Leverages Node.js's non-blocking I/O for efficient and concurrent web crawling.
- Error Handling: Implements robust mechanisms to manage errors and retries, ensuring resilience during crawling sessions.
- Ethical Scraping: Adheres to best practices by respecting
robots.txt
directives and includes guidelines for responsible data collection.
-
Clone the repository:
git clone https://github.com/TheAyushB/WebCrawlerJS.git cd WebCrawlerJS
-
Install dependencies:
npm install
-
Usage:
node main.js <URL to crawl>
├── main.js # Entry point for the CLI application
├── crawl.js # Core crawling logic
├── crawl.test.js # Unit tests for crawling functionality
├── report.js # Report generation logic
├── report.test.js # Unit tests for report generation
├── package.json # Project metadata and dependencies
├── README.md # Project documentation
└── LICENSE # License information
This project utilizes Jest for testing. To execute the test suite:
npm test
Contributions are welcome! Please adhere to the following guidelines:
- Fork the repository.
- Create a new branch for your feature or bug fix.
- Implement your changes, ensuring adherence to the project's coding standards.
- Write tests to cover new or modified functionality.
- Run all tests to confirm they pass.
- Submit a pull request with a detailed description of your changes.
This project is licensed under the MIT License. See the LICENSE file for more details.
Special thanks to the open-source community for providing the tools and libraries that made this project possible.