-
-
Notifications
You must be signed in to change notification settings - Fork 213
feat(data): Added automated CPU batch updater #841
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you very much, that's a great improvement ! I will review it in detail this Wednesday. If you want to fix the pre-commit error, you could look at https://github.com/mlco2/codecarbon/blob/master/CONTRIBUTING.md#coding-style--linting to install it locally. |
Thank you @benoit-cty, It now passes! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a new automated CPU batch updater that scrapes and processes CPU power consumption data for Intel and AMD processors, then aggregates the results.
- Implements a new script to fetch Intel CPU data via web scraping and process AMD CPU datasets from TechPowerUp.
- Aggregates data into a unified CSV (cpu_power.csv) and validates file existence and size.
When I run the script, it did not find any pages, does it still work for you ? I'm trying to get Manus.im write a script for us to scrape https://www.intel.com/content/www/us/en/ark/featurefilter.html?productType=873&3_MaxTDP-Min=0.03&3_MaxTDP-Max=500 and click on "Show more". |
The ARK database at https://www.intel.com/libs/apps/intel/support/ark/advancedFilterSearch?productType=873&3_MaxTDP-Min=0.03&3_MaxTDP-Max=500&forwardPath=/content/www/us/en/ark/featurefilter.html&pageNo=1&sort=&sortType= do not have all Intel CPUs. For example it miss "Intel Xeon Gold 6133". I push a script to do the scrapping and another one to do the merge. |
Hello, I've changed the scripts and add a documentation. Can you give a try and let me know if everything works on your side ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR introduces a unified, automated pipeline to scrape and merge Intel and AMD CPU TDP data into the existing cpu_power.csv
.
- Adds two scrapers (
intel_cpu_scrapper.py
,amd_cpu_scrapper.py
) to fetch CPU specs from Intel ARK and AMD product pages. - Adds
merge_scrapped_cpu_power.py
to clean, merge, and update the master CPU power CSV. - Bulk-updates AMD server and desktop CSV datasets, and documents the workflow in a new README.
Reviewed Changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 3 comments.
Show a summary per file
File | Description |
---|---|
merge_scrapped_cpu_power.py | New script to clean names, merge Intel/AMD TDP data, and update cpu_power.csv |
intel_cpu_scrapper.py | Adds an Intel CPU scraper using requests & BeautifulSoup |
amd_cpu_scrapper.py | Adds an AMD CPU scraper using Playwright |
amd_cpu_server_dataset.csv | Bulk updates AMD server CPU dataset entries |
amd_cpu_desktop_dataset.csv | Bulk updates AMD desktop CPU dataset entries |
README.md | Instructions for running scrapers and merge script |
Comments suppressed due to low confidence (2)
codecarbon/data/hardware/cpu_dataset_builder/intel_cpu_scrapper.py:1
- [nitpick] The file is named 'intel_cpu_scrapper.py' but the class is 'IntelCpuScraper'. Consider renaming the file to 'intel_cpu_scraper.py' to align spelling and conventions.
#!/usr/bin/env python3
codecarbon/data/hardware/cpu_dataset_builder/merge_scrapped_cpu_power.py:1
- There are no tests covering this new merging script. Consider adding unit or integration tests to validate name cleaning, TDP extraction, and merge logic.
This script updates the CPU power data by reading from Intel and AMD CPU data file,
codecarbon/data/hardware/cpu_dataset_builder/merge_scrapped_cpu_power.py
Show resolved
Hide resolved
codecarbon/data/hardware/cpu_dataset_builder/merge_scrapped_cpu_power.py
Outdated
Show resolved
Hide resolved
>> >> - This script provides a complete automated solution to update CPU power consumption data for both Intel and AMD processors in one run >> >> Addresses issue #840
…u_power.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Thanks @benoit-cty for refining the script and adding the necessary changes. |
cc
@benoit-cty
Upon looking at the CPU_Create_Dataset.ipynb notebook in the CodeCarbon repository, I came up with this script aimed at providing a complete and automated solution to update CPU power consumption data for both Intel and AMD processors in one run. Addressing what you requested and it accomplishes this by:
1. Handling both Intel and AMD CPUs in a single script execution while maintaining the existing file structure in a Unified Processing.
2. Automating Data Collection
3. Updating all relevant files in one run.
4. Checking all files exist and contain sufficient data during Validation.
Maintenance
-The script uses a single command
python -m codecarbon.data.hardware.cpu_batch_updater
to update everything.If at all there is any changes or suggestions, I am more than willing to make them to the best of my knowledge.
Thank you!
Addresses issue #840