Extract product data, reviews, and search results from Tokopedia with ease.
Tokopaedi is a powerful Python library designed for scraping e-commerce data from Tokopedia, including product searches, detailed product information, and customer reviews. Ideal for developers, data analysts, and businesses looking to analyze Tokopedia's marketplace.
- Product Search: Search Tokopedia products by keyword with customizable filters (price, rating, condition, etc.).
- Detailed Product Data: Retrieve rich product details, including variants, pricing, stock, and media.
- Customer Reviews: Scrape product reviews with ratings, timestamps, and more.
- Serializable Results: Dataclass-based results with
.json()
for easy export to JSON or pandas DataFrames. - SearchResults Container: Iterable and JSON-serializable container that supports enrich_details() and enrich_reviews() to automatically fetch product metadata and reviews for each item.
Tokopaedi is available on PyPi: https://pypi.org/project/tokopaedi/
Install Tokopaedi via pip:
pip install tokopaedi
from tokopaedi import search, SearchFilters, get_product, get_reviews
import json
filters = SearchFilters(
bebas_ongkir_extra = True,
pmin = 15000000,
pmax = 25000000,
rt = 4.5
)
results = search("Asus Zenbook S14 32GB", max_result=10, debug=True, filters=filters)
results.enrich_details(debug=True)
results.enrich_reviews(max_result=50, debug=True)
with open('log.json','w') as f:
f.write(json.dumps(results.json(), indent=4))
print(results.json())
π search(keyword: str, max_result: int = 100, filters: Optional[SearchFilters] = None, debug: bool = False) -> SearchResults
Search for products from Tokopedia.
Parameters:
-
keyword
: string keyword (e.g.,"logitech mouse"
). -
max_result
: Expected number of results to return. -
filters
: OptionalSearchFilters
instance to narrow search results. -
debug
: Show debug message if True
Returns:
- A
SearchResults
instance (list-like object ofProductSearchResult
), supporting.json()
for easy export.
π¦ get_product(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, debug: bool = False) -> ProductData
Fetch detailed information for a given Tokopedia product.
Parameters:
product_id
: (Optional) The product ID returned fromsearch()
. If provided, this will take precedence overurl
.url
: (Optional) The full product URL. Used only ifproduct_id
is not provided.debug
: IfTrue
, prints debug output for troubleshooting.
β οΈ Eitherproduct_id
orurl
must be provided. If both are given,product_id
is used andurl
is ignored.
Returns:
- A
ProductData
instance containing detailed information such as product name, pricing, variants, media, stock, rating, etc. - Supports
.json()
for easy serialization (e.g., to use withpandas
or export as.json
).
π£οΈ get_reviews(product_id: Optional[Union[int, str]] = None, url: Optional[str] = None, max_count: int = 20, debug: bool = False) -> List[ProductReview]
Scrape customer reviews for a given product.
Parameters:
product_id
: (Optional) The product ID to fetch reviews for. Takes precedence overurl
if both are provided.url
: (Optional) Full product URL. Used only ifproduct_id
is not provided.max_count
: Maximum number of reviews to fetch (default: 20).debug
: Show debug messages ifTrue
.
β οΈ Eitherproduct_id
orurl
must be provided.
Returns:
- A list of
ProductReview
objects. - Each object supports
.json()
for serialization (e.g., for use withpandas
or JSON export).
Returns:
- A new
SearchResults
object with.product_detail
and.product_reviews
fields filled in (if data was provided).
Use SearchFilters
to refine your search results. All fields are optional. Pass it into the search()
function via the filters
argument.
from tokopaedi import SearchFilters, search
filters = SearchFilters(
pmin=100000,
pmax=1000000,
condition=1, # 1 = New
is_discount=True,
bebas_ongkir_extra=True,
rt=4.5, # Minimum rating 4.5
latest_product=30 # Products listed in the last 30 days
)
results = search("logitech mouse", filters=filters)
Field | Type | Description | Accepted Values |
---|---|---|---|
pmin |
int |
Minimum price (in IDR) | e.g., 100000 |
pmax |
int |
Maximum price (in IDR) | e.g., 1000000 |
condition |
int |
Product condition | 1 = New, 2 = Used |
shop_tier |
int |
Type of shop | 2 = Mall, 3 = Power Shop |
rt |
float |
Minimum rating | e.g., 4.5 |
latest_product |
int |
Product recency filter | 7 , 30 , 90 |
bebas_ongkir_extra |
bool |
Filter for extra free shipping | True / False |
is_discount |
bool |
Only show discounted products | True / False |
is_fulfillment |
bool |
Only Fulfilled by Tokopedia | True / False |
is_plus |
bool |
Only Tokopedia PLUS sellers | True / False |
cod |
bool |
Cash on delivery available | True / False |
Tokopaedi supports data enrichment to attach detailed product information and customer reviews directly to search results. This is useful when you want to go beyond basic search metadata and analyze full product details or customer feedback.
# Enrich search results
results = search("Asus Zenbook S14 32GB", max_result=10, debug=True, filters=filters)
results.enrich_details(debug=True)
results.enrich_reviews(max_result=50, debug=True)
# Enrich product detail with reviews
product = get_product(url="https://www.tokopedia.com/asusrogindonesia/asus-tuf-a15-fa506ncr-ryzen-7-7435hs-rtx3050-4gb-8gb-512gb-w11-ohs-o365-15-6fhd-144hz-ips-rgb-blk-r735b1t-om-laptop-8gb-512gb-4970d?extParam=whid%3D17186756&aff_unique_id=&channel=others&chain_key=")
product.enrich_reviews(max_result=50, debug=True)
Enrichment methods are available on both the SearchResults
container and individual ProductData
objects:
-
enrich_details(debug: bool = False) -> None
Enriches all items in the result with detailed product info.debug
: IfTrue
, logs each enrichment step.
-
enrich_reviews(max_result: int = 10, debug: bool = False) -> None
Enriches all items with customer reviews (up tomax_result
per product).max_result
: Number of reviews to fetch for each product.debug
: IfTrue
, logs the review enrichment process.
-
enrich_details(debug: bool = False) -> None
Enriches this specific product with detailed information. -
enrich_reviews(max_result: int = 10, debug: bool = False) -> None
Enriches this product with customer reviews.
This design allows for flexibility: enrich a full result set at once, or enrich individual items selectively as needed.
Tokopaedi is fully compatible with Jupyter Notebook, making it easy to explore and manipulate data interactively. You can perform searches, enrich product details and reviews, and convert results to pandas DataFrames for analysis all from a notebook environment.
from tokopaedi import search, SearchFilters, get_product, get_reviews
import json
import pandas as pd
from pandas import json_normalize
filters = SearchFilters(
bebas_ongkir_extra = True,
pmin = 15000000,
pmax = 25000000,
rt = 4.5
)
results = search("Asus Zenbook S14 32GB", max_result=10, debug=False, filters=filters)
# Enrich each result with product details and reviews
results.enrich_details(debug=False)
results.enrich_reviews(max_result=50, debug=False)
# Convert to DataFrame and preview important fields
df = json_normalize(results.json())
df[["product_id", "product_name", "price", "price_original","discount_percentage","rating","shop.name"]].head()
# Reviews
df.iloc[0].reviews
# Retrieve single product by url
product = get_product(url="https://www.tokopedia.com/asusrogindonesia/asus-tuf-a15-fa506ncr-ryzen-7-7435hs-rtx3050-4gb-8gb-512gb-w11-ohs-o365-15-6fhd-144hz-ips-rgb-blk-r735b1t-om-laptop-8gb-512gb-4970d?extParam=whid%3D17186756&aff_unique_id=&channel=others&chain_key=")
product.enrich_reviews(max_result=50, debug=True)
df = json_normalize(product.json())
df[["product_id", "product_name", "price", "price_original","discount_percentage","rating","shop.name"]].head()
- Update search query
- Add
ProductData.description
to extract description
- Fix image link on documentation for PyPi release
- Improve price accuracy with user spoofing (mobile pricing)
- Shop type conistency
- Minor extractor fix
- Replaced
ProductSearchResult
withProductData
for a unified model - Removed
combine_data
and replace it with enrichment functions - Added
.enrich_details(debug=False)
toProductData
andSearchResults
- Added
.enrich_reviews(max_result=10, debug=False)
toProductData
andSearchResults
- Added
url
parameter toget_reviews()
andget_product()
for direct product URL support
- Improved documentation and metadata
- Initial release with:
search()
function with filtersget_product()
for detailed product infoget_reviews()
for customer reviews
Created by Hilmi Azizi. For inquiries, feedback, or collaboration, contact me at root@hilmiazizi.com. You can also reach out via GitHub Issues for bug reports or feature suggestions.
This project is licensed under the MIT License.
You are free to use, modify, and distribute this project with attribution. See the LICENSE file for more details.