Add torchao kernels to llama runner #6195

metascroy · 2024-10-14T19:03:00Z

Setup ET by following directions here: https://pytorch.org/executorch/stable/getting-started-setup

After ET is setup, we can build the Llama runner with torchao kernels.

Step 1 (build ET)

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_MPS=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out .
cmake --build cmake-out -j16 --target install --config Release

Step2 (build runner with torchao):

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DEXECUTORCH_BUILD_TORCHAO=ON \
    -DTORCHAO_BUILD_EXECUTORCH_OPS=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_MPS=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -Bcmake-out/examples/models/llama \
    examples/models/llama
cmake --build cmake-out/examples/models/llama -j16 --target install --config Release

Step3 (install runner requirements):

sh examples/models/llama/install_requirements.sh

Step3 (export model):

CMAKE_INSTALL_PREFIX=$PWD/cmake-out python -m examples.models.llama.export_llama --checkpoint /path/to/model.pth --params  /path/to/params.json -kv --use_sdpa_with_kv_cache -qmode "torchao:8da3w" --group_size 128 -E "torchao:3,32" --disable_dynamic_shape --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name /path/to/output.pte

The above quantizes the embeddings to 3 bits with groupsize 32 (-E torchao:3,32) and quantizes the linear layers with 8-bit dynamically quantized activations and weights to 3 bits with groupsize 128 "-qmode torchao:8da3w". You can play around with other quantization schemes. Starting with 4-bit (instead of 3-bit) is a good starting place for model quality. torchao supports 1-7 bit quantization for both linear and embedding layers.

Step4 (run model):

cmake-out/examples/models/llama/llama_main --model_path=/path/to/output.pte --tokenizer_path=/path/to/tokenizer.model --prompt="Once upon a time,"

Note: you can also export quantized model pte files with torchchat (https://github.com/pytorch/torchchat/blob/main/docs/quantization.md#executorch-1) and then run the PTE files using Step4.

pytorch-bot · 2024-10-14T19:03:04Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6195

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 444d44c with merge base ddc8ea6 ():
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

metascroy · 2024-10-14T19:03:34Z

examples/models/llama2/CMakeLists.txt

 endif()

+if(EXECUTORCH_BUILD_TORCHAO)
+  list(APPEND link_libraries "$<LINK_LIBRARY:WHOLE_ARCHIVE,${CMAKE_CURRENT_BINARY_DIR}/../../../lib/libtorchao_ops_executorch.a>")


Exporting targets with config in torchao makes this bit nicer. It could be:

set(torchao_DIR ${CMAKE_CURRENT_BINARY_DIR}/../../../lib/cmake/torchao) find_package(torchao REQUIRED) target_link_options_shared_lib(torchao::torchao_ops_executorch) list(APPEND link_libraries torchao::torchao_ops_executorch)

You already called add_subdirectory so here you can do something like this:

target_link_options(torchao_ops_executorch INTERFACE -Wl,--whole-archive -Wl,--no-whole-archive)

Or actually this line should live in torchao/experimental/CMakeLists.txt.

metascroy · 2024-10-14T19:05:34Z

CMakeLists.txt


+if(EXECUTORCH_BUILD_TORCHAO)
+  add_compile_options("-frtti")
+  set(EXECUTORCH_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/..)


The find_package(ExecuTorch) in torchao does not appear to define the variables EXECUTORCH_INCLUDE_DIRS and EXECUTORCH_LIBRARIES unless I call the sh install_requirements in torchao/experimental. But then I don't know if the op registration will work in examples/models/llama2.

I don't think these lines should live in the root level CMakeLists.txt, it seems you only need this for the runner build, or the custom_ops build.

facebook-github-bot · 2024-10-25T02:19:30Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-10-25T18:23:34Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

larryliu0820 · 2024-10-25T18:27:05Z

examples/models/llama/CMakeLists.txt


+if(EXECUTORCH_BUILD_TORCHAO)
+  set(TORCHAO_BUILD_EXECUTORCH_OPS ON)
+  add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/../../../third-party/ao/torchao/experimental ${CMAKE_CURRENT_BINARY_DIR}/../../../third-party/ao/torchao/experimental)


I think you can do ${EXECUTORCH_ROOT}/third-party/ao/torchao/experimental

But one is the source directory and the other the binary directory?

Yeah, I meant for the first one we can make it simpler. But it's up to you

larryliu0820 · 2024-10-25T18:31:30Z

examples/models/llama/source_transformation/quantize.py

+            os.path.abspath(
+                os.path.join(
+                    os.path.dirname(__file__),
+                    "../../../../cmake-out/third-party/ao/torchao/experimental/libtorchao_ops_aten.*",


uh this hardcoded path is not ideal. Can we add install() in torchao/experimental/CMakeLists.txt and then we can find the installed library in CMAKE_INSTALL_PREFIX?

If I understand, you want the glob path to be something like:

"{CMAKE_INSTALL_PREFIX}/lib/libtorchao_ops_aten.*"

instead of

"../../../../cmake-out/third-party/ao/torchao/experimental/libtorchao_ops_aten.*"

How is the python script going to know what CMAKE_INSTALL_PREFIX is? Do you want to user to define this as an environment variable?

facebook-github-bot · 2024-10-28T16:48:30Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

larryliu0820 · 2024-10-28T16:51:54Z

examples/models/llama/CMakeLists.txt


+if(EXECUTORCH_BUILD_TORCHAO)
+  set(TORCHAO_BUILD_EXECUTORCH_OPS ON)
+  add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/../../../third-party/ao/torchao/experimental ${CMAKE_CURRENT_BINARY_DIR}/../../../third-party/ao/torchao/experimental)


Yeah, I meant for the first one we can make it simpler. But it's up to you

facebook-github-bot · 2024-11-06T19:29:01Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-06T21:24:34Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-06T21:58:58Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-07T17:26:36Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-08T17:13:59Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-08T17:18:46Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot · 2024-11-08T23:29:21Z

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2024

metascroy commented Oct 14, 2024

View reviewed changes

metascroy requested a review from larryliu0820 October 14, 2024 19:05

metascroy force-pushed the torchao branch 2 times, most recently from ea358d5 to 3cc1d26 Compare October 23, 2024 21:49

metascroy force-pushed the torchao branch from 7016743 to 934ceb0 Compare October 25, 2024 17:49

metascroy requested a review from digantdesai October 25, 2024 17:55

metascroy marked this pull request as ready for review October 25, 2024 17:55

metascroy changed the title ~~[Draft] torchao~~ Add torchao kernels to llama runner Oct 25, 2024

larryliu0820 reviewed Oct 25, 2024

View reviewed changes

larryliu0820 approved these changes Oct 28, 2024

View reviewed changes

metascroy force-pushed the torchao branch from 9f955c9 to 939fe29 Compare November 6, 2024 19:20

kimishpatel approved these changes Nov 7, 2024

View reviewed changes

metascroy force-pushed the torchao branch from e645a87 to 15319c3 Compare November 7, 2024 17:25

metascroy force-pushed the torchao branch from 15319c3 to b22a01a Compare November 8, 2024 04:28

metascroy force-pushed the torchao branch from b22a01a to f1815a8 Compare November 8, 2024 17:16

metascroy added 2 commits November 8, 2024 11:39

add ao

c6a7eef

updates

1a949d9

metascroy added 18 commits November 8, 2024 11:39

updates

fec7090

updates

71e2288

up

ff6fc55

updates

a05e6e8

up

e42f3ec

ip

7e275e4

updates

47fbe3e

update quant

c25f557

up

920193b

lints

4b57302

up

5b168a2

update ao

8d451df

update export lib

3c844c7

fix

b19c4b5

up

39a3ab2

up

5c4dbb7

up

4ab82e5

up

4f3457f

metascroy force-pushed the torchao branch from f1815a8 to 4f3457f Compare November 8, 2024 19:39

up

444d44c

facebook-github-bot merged commit a809953 into pytorch:main Nov 11, 2024
40 checks passed

metascroy mentioned this pull request Nov 12, 2024

remove extra torchao commit pin #6784

Merged

metascroy mentioned this pull request Nov 20, 2024

Missing Out Variants When Running Llama3.2 Example Without XNNPack #6975

Open

Add torchao kernels to llama runner #6195

Add torchao kernels to llama runner #6195

Uh oh!

Conversation

metascroy commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pytorch-bot bot commented Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6195

✅ No Failures

Uh oh!

metascroy Oct 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 25, 2024

Uh oh!

facebook-github-bot commented Oct 25, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

metascroy Oct 25, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 28, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 6, 2024

Uh oh!

facebook-github-bot commented Nov 6, 2024

Uh oh!

facebook-github-bot commented Nov 6, 2024

Uh oh!

facebook-github-bot commented Nov 7, 2024

Uh oh!

facebook-github-bot commented Nov 8, 2024

Uh oh!

facebook-github-bot commented Nov 8, 2024

Uh oh!

facebook-github-bot commented Nov 8, 2024

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

metascroy commented Oct 14, 2024 •

edited

Loading

pytorch-bot bot commented Oct 14, 2024 •

edited

Loading

metascroy Oct 14, 2024 •

edited

Loading

metascroy Oct 25, 2024 •

edited

Loading