Skip to content

A project to scrape product data from Tokopedia, process it, and analyze using Python libraries such as Selenium, BeautifulSoup, and Pandas.

License

Notifications You must be signed in to change notification settings

ysmnaraindas-work/scrappingdata

Repository files navigation

Scrapping Data: Insights from Tokopedia Seblak Products

Project Overview

This project focuses on scraping and analyzing data of "seblak" products from Tokopedia, a popular e-commerce platform in Indonesia. By extracting product names, prices, sellers, cities, sales, and ratings, the project provides insights into market trends and customer preferences. The scraped data is processed and analyzed using Python libraries such as Selenium, BeautifulSoup, and Pandas.

Key Features

  • Data Scraping: Collect detailed product data from Tokopedia using Selenium and BeautifulSoup.
  • Data Cleaning: Process and clean raw data for accurate analysis.
  • Insights Generation: Analyze sales trends, pricing strategies, and customer ratings.

Technology Stack

  • Python: Selenium, BeautifulSoup, Pandas, NumPy
  • Jupyter Notebook: For data analysis and visualization
  • CSV: For data storage and sharing

Dataset

  • seblak_tokopedia.csv: Raw scraped data from Tokopedia.
  • seblak_tokopedia_clean.csv: Cleaned and processed dataset ready for analysis.

How to Run the Project

  1. Clone this repository:
    git clone https://github.com/username/scrappingdata.git
    cd scrappingdata
  2. Install dependencies:
    pip install -r requirements.txt
  3. Run the scraper:
    python main.py
  4. Analyze the data:
    • Open the Jupyter Notebook:
      jupyter notebook Data-Scrapping.ipynb

Directory Structure

scrappingdata/
├── main.py                   # Script for scraping data from Tokopedia
├── seblak_tokopedia.csv      # Raw scraped data
├── seblak_tokopedia_clean.csv # Cleaned and processed dataset
├── Data-Scrapping.ipynb      # Jupyter Notebook for data analysis
├── README.md                 # Project documentation
└── requirements.txt          # List of dependencies

Results and Insights

  • Top-selling Products: Identified based on sales data.
  • Price Distribution: Analyzed price trends for better understanding of market dynamics.
  • Customer Preferences: Insights into ratings and their relationship with sales.

Business Recommendations

  • Optimize pricing strategies based on competitors' pricing.
  • Focus on products with higher ratings to improve customer trust.
  • Consider regional preferences for better targeting.

Contact

For any inquiries, feel free to reach out via email at ysmnaraindas.work@gmail.com.

⭐ Don't forget to star this repository if you found it useful!

About

A project to scrape product data from Tokopedia, process it, and analyze using Python libraries such as Selenium, BeautifulSoup, and Pandas.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published