-
Notifications
You must be signed in to change notification settings - Fork 174
Deprecate iter_named_leaf_modules
and iter_named_quantizable_modules
#1628
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
👋 Hi! Thank you for contributing to llm-compressor. Please add the ready label when the PR is ready for review. Note: This is required to complete the testing suite, please only add the label once the PR is code complete and local testing has been performed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @kylesayrs, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request primarily refactors the codebase by removing custom module iteration utilities and replacing them with standard PyTorch Module
iteration methods. This change aims to simplify module traversal logic and align with PyTorch's native API. Additionally, the Observer
base class has been updated to inherit from compressed_tensors.InternalModule
, and minor code formatting and logic improvements have been applied.
Highlights
- Code Simplification: Replaced custom module iteration functions (
iter_named_leaf_modules
,iter_named_quantizable_modules
) with standard PyTorchmodel.modules()
ormodule.named_modules()
across the codebase. This streamlines module traversal and aligns with native PyTorch API. - Base Class Update: The
Observer
class insrc/llmcompressor/observers/base.py
now inherits fromcompressed_tensors.InternalModule
instead oftorch.nn.Module
, indicating a shift towards a more specialized base class within thecompressed_tensors
library. - Sparsity Check Logic Refinement: Improved the
is_sparse24_bitmask_supported
function insrc/llmcompressor/transformers/compression/sparsity_metadata_config.py
to explicitly skip already quantized modules, preventing redundant checks and potentially improving efficiency.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
The changes replace custom module iterators with standard PyTorch iteration. However, the replacements introduce bugs by failing to replicate the original filtering logic, particularly regarding leaf modules and specific quantizable modules. Critical issues in sparsity_metadata_config.py
and kv_cache
tests need to be addressed.
src/llmcompressor/transformers/compression/sparsity_metadata_config.py
Outdated
Show resolved
Hide resolved
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
iter_named_leaf_modules
and iter_named_quantizable_modules
iter_named_leaf_modules
and iter_named_quantizable_modules
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
This reverts commit 62e93a7.
Signed-off-by: Kyle Sayers <kylesayrs@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
discussed with @kylesayrs that mapping all iter_named_leaf_modules
, regardless of whether include_attn
/include_children
are False or True, to torch built-in .named_modules()
method, should not affect overall behavior
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Purpose
TransformBase
modulesPrerequisites
iter_named_leaf_modules
anditer_named_quantizable_modules
neuralmagic/compressed-tensors#381Changes
iter_named_leaf_modules
anditer_named_quantizable_modules
Observer
inherit fromInternalModule
Testing