Are you tired of manually sifting through car listings on turbo.az
to gather data? Do you wish there was an automated way to compile detailed information about vehicles for sale in Azerbaijan? Look no further!
TurboScraper is a powerful Java-based web scraping tool designed to effortlessly extract comprehensive car listing data from turbo.az
and organize it into a clean, accessible Excel spreadsheet.
- Effortless Data Extraction: Scrapes key information like title, price, location, mileage, year, ban type, transmission, engine volume, and color directly from
turbo.az
listings. - Deep Dive Details: Goes beyond summary information by navigating to individual product pages to fetch granular details for each car.
- Structured Output: Organizes all extracted data neatly into an Excel file (
.xlsx
), ready for analysis, reporting, or further processing. - Robust & Reliable: Built with
Jsoup
for efficient HTML parsing andApache POI
for seamless Excel file generation.
TurboScraper operates in a few simple steps:
- It connects to the main car listings page on
turbo.az
. - It identifies individual car listings on the page.
- For each listing, it extracts basic details and then navigates to the car's dedicated product page.
- On the product page, it extracts a wealth of detailed specifications.
- Finally, all collected data is compiled row-by-row into a new Excel spreadsheet named
product_data.xlsx
.
To get this project up and running on your local machine, follow these steps:
- Java Development Kit (JDK): Make sure you have JDK 8 or higher installed.
- Maven (recommended) or Gradle for dependency management.
-
Clone the repository:
git clone [https://github.com/YourUsername/TurboScraper.git](https://github.com/YourUsername/TurboScraper.git) cd TurboScraper
-
Add Dependencies:
If using Maven, add the following to your
pom.xml
:<dependencies> <dependency> <groupId>org.jsoup</groupId> <artifactId>jsoup</artifactId> <version>1.17.2</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi</artifactId> <version>5.2.5</version> </dependency> <dependency> <groupId>org.apache.poi</groupId> <artifactId>poi-ooxml</artifactId> <version>5.2.5</version> </dependency> </dependencies>
(Alternatively, you can manually download the Jsoup and Apache POI JARs and add them to your project's build path.)
-
Compile the project:
javac -cp "path/to/jsoup.jar:path/to/poi.jar:path/to/poi-ooxml.jar" Main.java # Adjust paths as needed
Or, if using Maven:
mvn clean install
-
Run the
Main
class:java -cp ".;path/to/jsoup.jar;path/to/poi.jar;path/to/poi-ooxml.jar" org.example.Main # Windows java -cp ".:path/to/jsoup.jar:path/to/poi.jar:path/to/poi-ooxml.jar" org.example.Main # macOS/Linux
Or, if using Maven:
mvn exec:java -Dexec.mainClass="org.example.Main"
After successful execution, an Excel file named product_data.xlsx
will be generated in your project's root directory, containing all the scraped car data.
This tool is intended for educational purposes and personal data analysis. Please be mindful of turbo.az
's terms of service and robots.txt file. Excessive or rapid scraping can lead to your IP being blocked. Use responsibly.
Contributions are welcome! If you have ideas for improvements, new features, or bug fixes, feel free to open an issue or submit a pull request.
This project is open-source and available under the MIT License.
Happy Scraping!