Skip to content

Conversation

metascroy
Copy link
Contributor

@metascroy metascroy commented Oct 14, 2024

Setup ET by following directions here: https://pytorch.org/executorch/stable/getting-started-setup

After ET is setup, we can build the Llama runner with torchao kernels.

Step 1 (build ET)

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DEXECUTORCH_ENABLE_LOGGING=1 \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_EXTENSION_DATA_LOADER=ON \
    -DEXECUTORCH_BUILD_EXTENSION_MODULE=ON \
    -DEXECUTORCH_BUILD_EXTENSION_TENSOR=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_MPS=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -Bcmake-out .
cmake --build cmake-out -j16 --target install --config Release

Step2 (build runner with torchao):

cmake -DPYTHON_EXECUTABLE=python \
    -DCMAKE_PREFIX_PATH=$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())') \
    -DCMAKE_INSTALL_PREFIX=cmake-out \
    -DCMAKE_BUILD_TYPE=Release \
    -DEXECUTORCH_BUILD_KERNELS_CUSTOM=ON \
    -DEXECUTORCH_BUILD_TORCHAO=ON \
    -DTORCHAO_BUILD_EXECUTORCH_OPS=ON \
    -DEXECUTORCH_BUILD_KERNELS_OPTIMIZED=ON \
    -DEXECUTORCH_BUILD_XNNPACK=ON \
    -DEXECUTORCH_BUILD_MPS=ON \
    -DEXECUTORCH_BUILD_KERNELS_QUANTIZED=ON \
    -Bcmake-out/examples/models/llama \
    examples/models/llama
cmake --build cmake-out/examples/models/llama -j16 --target install --config Release

Step3 (install runner requirements):

sh examples/models/llama/install_requirements.sh

Step3 (export model):

CMAKE_INSTALL_PREFIX=$PWD/cmake-out python -m examples.models.llama.export_llama --checkpoint /path/to/model.pth --params  /path/to/params.json -kv --use_sdpa_with_kv_cache -qmode "torchao:8da3w" --group_size 128 -E "torchao:3,32" --disable_dynamic_shape --metadata '{"get_bos_id":128000, "get_eos_ids":[128009, 128001]}' --output_name /path/to/output.pte

The above quantizes the embeddings to 3 bits with groupsize 32 (-E torchao:3,32) and quantizes the linear layers with 8-bit dynamically quantized activations and weights to 3 bits with groupsize 128 "-qmode torchao:8da3w". You can play around with other quantization schemes. Starting with 4-bit (instead of 3-bit) is a good starting place for model quality. torchao supports 1-7 bit quantization for both linear and embedding layers.

Step4 (run model):

cmake-out/examples/models/llama/llama_main --model_path=/path/to/output.pte --tokenizer_path=/path/to/tokenizer.model --prompt="Once upon a time,"

Note: you can also export quantized model pte files with torchchat (https://github.com/pytorch/torchchat/blob/main/docs/quantization.md#executorch-1) and then run the PTE files using Step4.

Copy link

pytorch-bot bot commented Oct 14, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/6195

Note: Links to docs will display an error until the docs builds have been completed.

✅ No Failures

As of commit 444d44c with merge base ddc8ea6 (image):
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Oct 14, 2024
endif()

if(EXECUTORCH_BUILD_TORCHAO)
list(APPEND link_libraries "$<LINK_LIBRARY:WHOLE_ARCHIVE,${CMAKE_CURRENT_BINARY_DIR}/../../../lib/libtorchao_ops_executorch.a>")
Copy link
Contributor Author

@metascroy metascroy Oct 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exporting targets with config in torchao makes this bit nicer. It could be:

set(torchao_DIR ${CMAKE_CURRENT_BINARY_DIR}/../../../lib/cmake/torchao)
find_package(torchao REQUIRED)
target_link_options_shared_lib(torchao::torchao_ops_executorch)
list(APPEND link_libraries torchao::torchao_ops_executorch)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You already called add_subdirectory so here you can do something like this:

target_link_options(torchao_ops_executorch INTERFACE -Wl,--whole-archive -Wl,--no-whole-archive)

Or actually this line should live in torchao/experimental/CMakeLists.txt.

CMakeLists.txt Outdated

if(EXECUTORCH_BUILD_TORCHAO)
add_compile_options("-frtti")
set(EXECUTORCH_INCLUDE_DIRS ${CMAKE_CURRENT_SOURCE_DIR}/..)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The find_package(ExecuTorch) in torchao does not appear to define the variables EXECUTORCH_INCLUDE_DIRS and EXECUTORCH_LIBRARIES unless I call the sh install_requirements in torchao/experimental. But then I don't know if the op registration will work in examples/models/llama2.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think these lines should live in the root level CMakeLists.txt, it seems you only need this for the runner build, or the custom_ops build.

@metascroy metascroy force-pushed the torchao branch 2 times, most recently from ea358d5 to 3cc1d26 Compare October 23, 2024 21:49
@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@metascroy metascroy requested a review from digantdesai October 25, 2024 17:55
@metascroy metascroy marked this pull request as ready for review October 25, 2024 17:55
@metascroy metascroy changed the title [Draft] torchao Add torchao kernels to llama runner Oct 25, 2024
@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.


if(EXECUTORCH_BUILD_TORCHAO)
set(TORCHAO_BUILD_EXECUTORCH_OPS ON)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/../../../third-party/ao/torchao/experimental ${CMAKE_CURRENT_BINARY_DIR}/../../../third-party/ao/torchao/experimental)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can do ${EXECUTORCH_ROOT}/third-party/ao/torchao/experimental

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But one is the source directory and the other the binary directory?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant for the first one we can make it simpler. But it's up to you

os.path.abspath(
os.path.join(
os.path.dirname(__file__),
"../../../../cmake-out/third-party/ao/torchao/experimental/libtorchao_ops_aten.*",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

uh this hardcoded path is not ideal. Can we add install() in torchao/experimental/CMakeLists.txt and then we can find the installed library in CMAKE_INSTALL_PREFIX?

Copy link
Contributor Author

@metascroy metascroy Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand, you want the glob path to be something like:

"{CMAKE_INSTALL_PREFIX}/lib/libtorchao_ops_aten.*"

instead of

"../../../../cmake-out/third-party/ao/torchao/experimental/libtorchao_ops_aten.*"

How is the python script going to know what CMAKE_INSTALL_PREFIX is? Do you want to user to define this as an environment variable?

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.


if(EXECUTORCH_BUILD_TORCHAO)
set(TORCHAO_BUILD_EXECUTORCH_OPS ON)
add_subdirectory(${CMAKE_CURRENT_SOURCE_DIR}/../../../third-party/ao/torchao/experimental ${CMAKE_CURRENT_BINARY_DIR}/../../../third-party/ao/torchao/experimental)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I meant for the first one we can make it simpler. But it's up to you

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@metascroy has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot facebook-github-bot merged commit a809953 into pytorch:main Nov 11, 2024
40 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants