The goal of this project was to develop a scraper that collects data from the Google Play platform and categorizes applications accordingly.
This project serves as a supporting tool for an application classification project.
The scraper was implemented using an external scraper written by facundoolano.
The external scraper used in this project has some limitations, the most significant being that retrieving applications from a specific category returns a maximum of 200 results.
To bypass this limitation, additional query parameters are used.
The scraper collects data for each category by:
- Querying multiple countries from a predefined list
- Retrieving data for three different collections per country:
- TOP_PAID
- TOP_FREE
- TOP_GROSSING
- Maximum number of retrieved applications per category:
- 200 per collection
- 200 × number of countries
In this implementation, the maximum number of applications collected is 149,400, as it includes 249 country codes.
✔️ The scraper first collects application IDs, processing each category separately and storing the results in category-specific files.
✔️ Once files for all categories are generated, a final file containing all application IDs is created.
✔️ The final step generates a CSV output file with complete application details.
Since this scraper was built for another project, it only collects the following details for each application:
- Number of installs
- Score on Google Play
- Number of score ratings
- Number of reviews
- Category
- Free or paid status
The scraper can be executed in two modes:
- 🚀 Normal Mode – retrieves only application data
- 📊 Ratio Mode – retrieves application data and generates a paid vs. free report
npm install
node app.js
npm install
node app.js ratio