-
Notifications
You must be signed in to change notification settings - Fork 602
Add script to fetch benchmark results for execuTorch #11734
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
+2,048
−115
Merged
Changes from 52 commits
Commits
Show all changes
55 commits
Select commit
Hold shift + click to select a range
3e8fa8f
final
yangw-dev 07896ea
final
yangw-dev 79f7788
final
yangw-dev 101d631
final
yangw-dev 83e76fe
final
yangw-dev 3b4047f
final
yangw-dev 1da3e87
final
yangw-dev 2f604a0
final
yangw-dev 9fa50a4
final
yangw-dev b56863d
final
yangw-dev b2ad5b6
final
yangw-dev 4aced24
final
yangw-dev ab6e6cf
final
yangw-dev 8e6956d
final
yangw-dev 87ba460
final
yangw-dev 5d22567
final
yangw-dev 7a032e1
final
yangw-dev d33ce72
final
yangw-dev 702478b
final
yangw-dev 03cff40
final
yangw-dev 44b792b
final
yangw-dev 37965f3
final
yangw-dev 908ee26
final
yangw-dev 4d30776
final
yangw-dev e27779e
final
yangw-dev f8ba103
final
yangw-dev 0fb1b2b
final
yangw-dev 2878e96
Merge branch 'main' into addScript
yangw-dev 04dbd97
final
yangw-dev 95b30a4
final
yangw-dev d22af04
fix error test
yangw-dev 25769da
fix error test
yangw-dev 48d9d34
Merge branch 'main' into addScript
yangw-dev 51b326a
fix error test
yangw-dev 13261da
fix error test
yangw-dev ffe6839
Merge branch 'main' into addScript
yangw-dev d7a6652
fix error test
yangw-dev 9182682
fix error test
yangw-dev 33de04f
fix error test
yangw-dev 68bf6f5
fix error test
yangw-dev 3c2cbd2
fix error test
yangw-dev 8de90bf
Merge branch 'main' into addScript
yangw-dev 990ff44
fix error test
yangw-dev ed48f5f
Merge branch 'main' into addScript
yangw-dev 99df0fe
fix error test
yangw-dev 7be520e
setup link
yangw-dev 600bb1a
setup link
yangw-dev 4c0fdd2
setup link
yangw-dev faa2012
setup link
yangw-dev 82c72eb
setup link
yangw-dev 9e2ee88
setup link
yangw-dev e97850b
Merge branch 'main' into addScript
yangw-dev 26dec1e
setup link
yangw-dev 12b4ed7
Merge branch 'main' into addScript
yangw-dev 1a3795e
setup link
yangw-dev File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,171 @@ | ||
# Executorch Benchmark Tooling | ||
|
||
A library providing tools for fetching, processing, and analyzing ExecutorchBenchmark data from the HUD Open API. This tooling helps compare performance metrics between private and public devices with identical settings. | ||
|
||
## Table of Contents | ||
|
||
- [Overview](#overview) | ||
- [Installation](#installation) | ||
- [Tools](#tools) | ||
- [get_benchmark_analysis_data.py](#get_benchmark_analysis_datapy) | ||
- [Quick Start](#quick-start) | ||
- [Command Line Options](#command-line-options) | ||
- [Example Usage](#example-usage) | ||
- [Working with Output Files](#working-with-output-files-csv-and-excel) | ||
- [Python API Usage](#python-api-usage) | ||
- [Running Unit Tests](#running-unit-tests) | ||
|
||
## Overview | ||
|
||
The Executorch Benchmark Tooling provides a suite of utilities designed to: | ||
|
||
- Fetch benchmark data from HUD Open API for specified time ranges | ||
- Clean and process data by filtering out failures | ||
- Compare metrics between private and public devices with matching configurations | ||
- Generate analysis reports in various formats (CSV, Excel, JSON) | ||
- Support filtering by device pools, backends, and models | ||
|
||
This tooling is particularly useful for performance analysis, regression testing, and cross-device comparisons. | ||
|
||
## Installation | ||
|
||
Install dependencies: | ||
|
||
```bash | ||
pip install -r requirements.txt | ||
``` | ||
|
||
## Tools | ||
|
||
### get_benchmark_analysis_data.py | ||
|
||
This script is mainly used to generate analysis data comparing private devices with public devices using the same settings. | ||
|
||
It fetches benchmark data from HUD Open API for a specified time range, cleans the data by removing entries with FAILURE indicators, and retrieves all private device metrics along with equivalent public device metrics based on matching [model, backend, device_pool_names, arch] configurations. Users can filter the data by specifying private device_pool_names, backends, and models. | ||
|
||
#### Quick Start | ||
|
||
```bash | ||
# generate excel sheets for all private devices with public devices using the same settings | ||
python3 .ci/scripts/benchmark_tooling/get_benchmark_analysis_data.py \ | ||
--startTime "2025-06-11T00:00:00" \ | ||
--endTime "2025-06-17T18:00:00" \ | ||
--outputType "excel" | ||
|
||
python3 .ci/scripts/benchmark_tooling/analyze_benchmark_stability.py \ | ||
--primary-file private.xlsx \ | ||
--reference-file public.xlsx | ||
``` | ||
|
||
#### Command Line Options | ||
|
||
##### Basic Options: | ||
- `--startTime`: Start time in ISO format (e.g., "2025-06-11T00:00:00") (required) | ||
- `--endTime`: End time in ISO format (e.g., "2025-06-17T18:00:00") (required) | ||
- `--env`: Choose environment ("local" or "prod", default: "prod") | ||
- `--no-silent`: Show processing logs (default: only show results & minimum logging) | ||
|
||
##### Output Options: | ||
- `--outputType`: Choose output format (default: "print") | ||
- `print`: Display results in console | ||
- `json`: Generate JSON file | ||
- `df`: Display results in DataFrame format: `{'private': List[{'groupInfo':Dict,'df': DF},...],'public':List[{'groupInfo':Dict,'df': DF}]` | ||
- `excel`: Generate Excel files with multiple sheets, the field in first row and first column contains the JSON string of the raw metadata | ||
- `csv`: Generate CSV files in separate folders, the field in first row and first column contains the JSON string of the raw metadata | ||
- `--outputDir`: Directory to save output files (default: current directory) | ||
|
||
##### Filtering Options: | ||
|
||
- `--private-device-pools`: Filter by private device pool names (e.g., "samsung-galaxy-s22-5g", "samsung-galaxy-s22plus-5g") | ||
- `--backends`: Filter by specific backend names (e.g., "qnn-q8", "llama3-spinquan") | ||
- `--models`: Filter by specific model names (e.g., "mv3", "meta-llama-llama-3.2-1b-instruct-qlora-int4-eo8") | ||
|
||
#### Example Usage | ||
|
||
Filter by multiple private device pools and models: | ||
```bash | ||
# This fetches all private table data for models 'llama-3.2-1B' and 'mv3' | ||
python3 get_benchmark_analysis_data.py \ | ||
--startTime "2025-06-01T00:00:00" \ | ||
--endTime "2025-06-11T00:00:00" \ | ||
--private-device-pools 'apple_iphone_15_private' 'samsung_s22_private' \ | ||
--models 'meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8' 'mv3' | ||
``` | ||
|
||
Filter by specific device pool and models: | ||
```bash | ||
# This fetches all private iPhone table data for models 'llama-3.2-1B' and 'mv3', | ||
# and associated public iPhone data | ||
python3 get_benchmark_analysis_data.py \ | ||
--startTime "2025-06-01T00:00:00" \ | ||
--endTime "2025-06-11T00:00:00" \ | ||
--private-device-pools 'apple_iphone_15_private' \ | ||
--models 'meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8' 'mv3' | ||
``` | ||
|
||
#### Working with Output Files CSV and Excel | ||
|
||
You can use methods in `common.py` to convert the file data back to DataFrame format. These methods read the first row in CSV/Excel files and return results with the format `list of {"groupInfo":DICT, "df":df.Dataframe{}}`. | ||
|
||
```python | ||
import logging | ||
logging.basicConfig(level=logging.INFO) | ||
from .ci.scripts.benchmark_tooling.common import read_all_csv_with_metadata, read_excel_with_json_header | ||
|
||
# For CSV files (assuming the 'private' folder is in the current directory) | ||
folder_path = './private' | ||
res = read_all_csv_with_metadata(folder_path) | ||
logging.info(res) | ||
|
||
# For Excel files (assuming the Excel file is in the current directory) | ||
file_path = "./private.xlsx" | ||
res = read_excel_with_json_header(file_path) | ||
logging.info(res) | ||
``` | ||
|
||
#### Python API Usage | ||
|
||
To use the benchmark fetcher in your own scripts: | ||
|
||
```python | ||
from .ci.scripts.benchmark_tooling.get_benchmark_analysis_data import ExecutorchBenchmarkFetcher | ||
|
||
# Initialize the fetcher | ||
fetcher = ExecutorchBenchmarkFetcher(env="prod", disable_logging=False) | ||
|
||
# Fetch data for a specific time range | ||
fetcher.run( | ||
start_time="2025-06-11T00:00:00", | ||
end_time="2025-06-17T18:00:00" | ||
) | ||
|
||
# Get results in different formats | ||
# As DataFrames | ||
df_results = fetcher.to_df() | ||
|
||
# Export to Excel | ||
fetcher.to_excel(output_dir="./results") | ||
|
||
# Export to CSV | ||
fetcher.to_csv(output_dir="./results") | ||
|
||
# Export to JSON | ||
json_path = fetcher.to_json(output_dir="./results") | ||
|
||
# Get raw dictionary results | ||
dict_results = fetcher.to_dict() | ||
|
||
# Use the output_data method for flexible output | ||
results = fetcher.output_data(output_type="excel", output_dir="./results") | ||
``` | ||
|
||
## Running Unit Tests | ||
|
||
The benchmark tooling includes unit tests to ensure functionality. | ||
|
||
### Using pytest for unit tests | ||
|
||
```bash | ||
# From the executorch root directory | ||
pytest -c /dev/null .ci/scripts/tests/test_get_benchmark_analysis_data.py | ||
``` |
Empty file.
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.