Skip to content

Conversation

mohitmundhragithub
Copy link
Contributor

No description provided.

@mohitmundhragithub mohitmundhragithub requested review from a team and anhappdev as code owners August 26, 2025 05:41
Copy link

github-actions bot commented Aug 26, 2025

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

…lemented performance benchmark for LLM pipeline
…y input and issue_query only handles output tokens
@farook-edev farook-edev changed the title Feat llm LLM pipeline implementation Sep 2, 2025
@farook-edev farook-edev linked an issue Sep 2, 2025 that may be closed by this pull request
Copy link
Contributor Author

@mohitmundhragithub mohitmundhragithub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

.

namespace mobile {

// A method to be called by the backend as soon as the first token is generated (only for token based benchmarks)
static void FirstTokenCallback(void* context) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the use of context?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the context is the arguments that get passed to loadGen, these are created by the driver and sent to the backend. Backend only needs to pass those to the callback without reading/modifying them.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@freedomtan to check it.

@farook-edev
Copy link
Contributor

@freedomtan I need access to google cloud to see what's wrong with the windows build. Could you please provide access?

@farook-edev
Copy link
Contributor

@freedomtan @anhappdev IOS test seems to hit a time limit while installing dependencies... Is this normal or is it something related to this PR?

@freedomtan
Copy link
Contributor

@anhappdev could you help check the windows and iOS cases?

@anhappdev
Copy link
Collaborator

The iOS build has a timeout of 180 minutes, which is the reason why it prematurely stopped (cancelled).

Here's the log for the Windows build:
windows.log

@freedomtan
Copy link
Contributor

The tflite backend doesn't run either Pixel 9 or Pixel 10.
the apk I used: https://github.com/mlcommons/mobile_app_open/actions/runs/18272073358/artifacts/4189457127
I tested all 3 models.

10-07 10:18:41.148 31523 31697 I native  : cpp/flutter/dart_run_benchmark.cc:65 li:flutter/cpp/flutter/dart_run_benchmark.cc:65@dart_ffi_run_benchmark
10-07 10:18:41.150 31523 31697 I native  : cpp/backends/external.cc:195 Using backend allocator
10-07 10:18:41.150 31523 31697 I native  : cpp/backend_tflite/tflite_c.cc:173 Using TfLite 2.18.0 With Schema 3
10-07 10:18:41.150 31523 31697 I native  : cpp/backend_tflite/tflite_c.cc:50 Initializing LLMPipeline
10-07 10:18:41.150 31523 31697 E tflite  : Mmap of '184' at offset '0' failed with error '13'.
10-07 10:18:41.150 31523 31697 E native  : cpp/backend_tflite/llm_pipeline.cc:62 Failed to load model: /data/user/0/org.mlcommons.android.mlperfbench/cache/symlinks/CPU
10-07 10:18:41.150 31523 31697 F native  : cpp/backends/external.cc:212 Failed to create external backend
10-07 10:18:41.146 31523 31523 W DartWorker: type=1400 audit(0.0:2455): avc:  denied  { map } for  path="/data/data/org.mlcommons.android.mlperfbench/cache/symlinks/CPU" dev="dm-60" ino=82460 scontext=u:r:untrusted_app:s0:c70,c257,c512,c768 tcontext=u:object_r:app_data_file:s0:c70,c257,c512,c768 tclass=dir permissive=0 app=org.mlcommons.android.mlperfbench
10-07 10:18:41.194 31717 31717 W linker64: type=1400 audit(0.0:2456): avc:  denied  { search } for  name="tests" dev="dm-60" ino=117 scontext=u:r:untrusted_app:s0:c70,c257,c512,c768 tcontext=u:object_r:shell_test_data_file:s0 tclass=dir permissive=0 app=org.mlcommons.android.mlperfbench
10-07 10:18:41.194 31717 31717 W linker64: type=1400 audit(0.0:2457): avc:  denied  { search } for  name="tests" dev="dm-60" ino=117 scontext=u:r:untrusted_app:s0:c70,c257,c512,c768 tcontext=u:object_r:shell_test_data_file:s0 tclass=dir permissive=0 app=org.mlcommons.android.mlperfbench
10-07 10:18:41.194 31717 31717 W linker64: type=1400 audit(0.0:2458): avc:  denied  { search } for  name="tests" dev="dm-60" ino=117 scontext=u:r:untrusted_app:s0:c70,c257,c512,c768 tcontext=u:object_r:shell_test_data_file:s0 tclass=dir permissive=0 app=org.mlcommons.android.mlperfbench
10-07 10:18:41.194 31717 31717 W linker64: type=1400 audit(0.0:2459): avc:  denied  { search } for  name="tests" dev="dm-60" ino=117 scontext=u:r:untrusted_app:s0:c70,c257,c512,c768 tcontext=u:object_r:shell_test_data_file:s0 tclass=dir permissive=0 app=org.mlcommons.android.mlperfbench
10-07 10:18:41.208 31717 31717 E chromium: [31717:31717:20251007,101841.208585:ERROR process_memory_linux.cc:50] pread64: I/O error (5)
10-07 10:18:41.237 31717 31717 E chromium: [31717:31717:20251007,101841.237069:ERROR elf_dynamic_array_reader.h:64] tag not found
10-07 10:18:41.241  1460  1555 I DisplayDeviceRepository: Display device changed render timings: "Built-in Screen", renderFrameRate=120.00002, presentationDeadlineNanos=11499999, appVsyncOffsetNanos=6233332, frameRateOverrides=[{uid=10227 frameRateHz=60.00001}]
10-07 10:18:41.245 31717 31717 W chromium: [31717:31717:20251007,101841.245336:WARNING thread_snapshot_linux.cc:112] Unknown scheduling policy 1073741824
10-07 10:18:41.246 31717 31717 W chromium: [31717:31717:20251007,101841.246317:WARNING thread_snapshot_linux.cc:112] Unknown scheduling policy 1073741824
10-07 10:18:41.246 31717 31717 W chromium: [31717:31717:20251007,101841.246467:WARNING thread_snapshot_linux.cc:112] Unknown scheduling policy 1073741824
10-07 10:18:41.247 31717 31717 W chromium: [31717:31717:20251007,101841.247573:WARNING thread_snapshot_linux.cc:112] Unknown scheduling policy 1073741824
10-07 10:18:41.248 31717 31717 W chromium: [31717:31717:20251007,101841.248331:WARNING thread_snapshot_linux.cc:112] Unknown scheduling policy 1073741824
10-07 10:18:41.250 31717 31717 W chromium: [31717:31717:20251007,101841.250181:WARNING thread_snapshot_linux.cc:112] Unknown scheduling policy 1073741824
10-07 10:18:41.250 31717 31717 W chromium: [31717:31717:20251007,101841.250315:WARNING thread_snapshot_linux.cc:112] Unknown scheduling policy 1073741824
10-07 10:18:41.258 31717 31717 W linker64: type=1400 audit(0.0:2460): avc:  denied  { search } for  name="battery" dev="sysfs" ino=78141 scontext=u:r:untrusted_app:s0:c70,c257,c512,c768 tcontext=u:object_r:sysfs_batteryinfo:s0 tclass=dir permissive=0 app=org.mlcommons.android.mlperfbench
10-07 10:18:41.265 31523 31697 F libc    : Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 31697 (DartWorker), pid 31523 (oid.mlperfbench)
10-07 10:18:41.317 31722 31722 I crash_dump64: obtaining output fd from tombstoned, type: kDebuggerdTombstoneProto
10-07 10:18:41.318   649   649 I tombstoned: received crash request for pid 31697
10-07 10:18:41.322 31722 31722 I crash_dump64: performing dump of process 31523 (target tid = 31697)
10-07 10:18:41.439 31722 31722 F DEBUG   : *** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
10-07 10:18:41.439 31722 31722 F DEBUG   : Build fingerprint: 'google/komodo/komodo:16/BP3A.250905.014/13873947:user/release-keys'
10-07 10:18:41.439 31722 31722 F DEBUG   : Kernel Release: '6.1.134-android14-11-g66e758f7d0c0-ab13748739'
10-07 10:18:41.439 31722 31722 F DEBUG   : Revision: 'MP1.0'
10-07 10:18:41.439 31722 31722 F DEBUG   : ABI: 'arm64'
10-07 10:18:41.439 31722 31722 F DEBUG   : Timestamp: 2025-10-07 10:18:41.333060132+0800
10-07 10:18:41.439 31722 31722 F DEBUG   : Process uptime: 98s
10-07 10:18:41.439 31722 31722 F DEBUG   : Executable: /system/bin/app_process64
10-07 10:18:41.439 31722 31722 F DEBUG   : Cmdline: org.mlcommons.android.mlperfbench
10-07 10:18:41.439 31722 31722 F DEBUG   : pid: 31523, tid: 31697, name: DartWorker  >>> org.mlcommons.android.mlperfbench <<<
10-07 10:18:41.439 31722 31722 F DEBUG   : uid: 10326
10-07 10:18:41.439 31722 31722 F DEBUG   : tagged_addr_ctrl: 000000000007fff1 (PR_TAGGED_ADDR_ENABLE, mask 0xfffe)
10-07 10:18:41.439 31722 31722 F DEBUG   : pac_enabled_keys: 000000000000000f (PR_PAC_APIAKEY, PR_PAC_APIBKEY, PR_PAC_APDAKEY, PR_PAC_APDBKEY)
10-07 10:18:41.439 31722 31722 F DEBUG   : esr: 0000000092000006 (Data Abort Exception 0x24)
10-07 10:18:41.439 31722 31722 F DEBUG   : signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------
10-07 10:18:41.439 31722 31722 F DEBUG   : Abort message: 'cpp/backends/external.cc:212 Failed to create external backend'
10-07 10:18:41.439 31722 31722 F DEBUG   :     x0  0000000000000000  x1  0000000000007bd1  x2  0000000000000006  x3  0000006ce7aaed20
10-07 10:18:41.439 31722 31722 F DEBUG   :     x4  000000000000000a  x5  000000000000000a  x6  000000000000000a  x7  203a206576697461
10-07 10:18:41.439 31722 31722 F DEBUG   :     x8  00000000000000f0  x9  f315c3f7ef101518  x10 0000000000000001  x11 00000070bff06730
10-07 10:18:41.439 31722 31722 F DEBUG   :     x12 0000000000040004  x13 000000007fffffff  x14 0000000000000000  x15 00000d7fdc738afa
10-07 10:18:41.439 31722 31722 F DEBUG   :     x16 00000070bff710f0  x17 00000070bff56d40  x18 0000006cd8eb0000  x19 0000000000007b23
10-07 10:18:41.439 31722 31722 F DEBUG   :     x20 0000000000007bd1  x21 00000000ffffffff  x22 0000006ce7aaede9  x23 0000000000000020
...

@farook-edev
Copy link
Contributor

This seems to be the culprit:

10-07 10:18:41.150 31523 31697 E native : cpp/backend_tflite/llm_pipeline.cc:62 Failed to load model: /data/user/0/org.mlcommons.android.mlperfbench/cache/symlinks/CPU

it seems the pipeline isn't getting a proper path to the model..

@farook-edev
Copy link
Contributor

farook-edev commented Oct 7, 2025

Quality Gate Failed Quality Gate failed

Failed conditions 3.1% Duplication on New Code (required ≤ 3%)

See analysis details on SonarQube Cloud

The duplicated code is the dataset interface (override function declaration)..

@freedomtan
Copy link
Contributor

something like

adb shell /data/local/tmp/mlperf_main external llm --mode=AccuracyOnly --output_dir=/data/local/tmp/test_llm_output  --model_file=/sdcard/Android/data/org.mlcommons.android.mlperfbench/files/mlperf_models/llama_q8_ekv3072.tflite  --sp_path=/sdcard/Android/data/org.mlcommons.android.mlperfbench/files/mlperf_models/llama3_1b.spm.model  --input_tfrecord=/sdcard/Android/data/org.mlcommons.android.mlperfbench/files/mlperf_datasets/tinymmlu/data.tfrecord --lib_path=/data/local/tmp/libtflitebackend.so

runs on Pixel devices.

@anhappdev
Copy link
Collaborator

This PR should resolve the issue with the iOS build: #1064
However, the Windows build still fails. Here's the log: 2025-10-07-windows.log

@farook-edev
Copy link
Contributor

This PR should resolve the issue with the iOS build: #1064
However, the Windows build still fails. Here's the log: 2025-10-07-windows.log

Thanks a ton! I'll look into the windows issue.

Copy link

@freedomtan
Copy link
Contributor

@freedomtan to test the app (and the accuracy of tinyMMLU).

@freedomtan
Copy link
Contributor

for performance:

  • time to first token
  • tokens/s

for accuracy:

  • tinyMMLU
  • ifeval

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Master issue: LLM Benchmark

4 participants