-
Notifications
You must be signed in to change notification settings - Fork 2k
Description
@BenjaminBossan I am trying to add dynamic Lora support to both vLLM and SGLang as LoraConfig already supports this dynamic control via the following variables:
rank_pattern
: regex matching of which differentr
/rank
values are appliedexclude_modules
: regex: which modules are not excluded from lora completedlyalpha_pattern
: regex matching ofalpha
override. extactly the same asrank_pattern
but different property.
Nothing wrong with them individually but together, they become unncessary detached and has negative impact on code cost but also on dynamic control efficiency.
GPTQModel uses a single dynamic
: Diction[str, Dict[]] where the str
is a regex with +:
(positive prefix, optional), -:
negative prefix (Optional).
The dict value is the property override in string: value format.
Example as applied to PEFT (Proposal):
# implicit +: prefix if not used
# prefixs are stripped before the regex is performed
"mlp\.down_proj": { "r": 128 } # implicit positive
"+:mlp\.down_proj": { "r": 256 } # explicit positive
"-:mlp\.gate_proj": {} # negative
This simple control allows 3 states.
- Positive match == override any property values in base config (LoraConfig).
- Negative match == skip this modele for Lora (no LoraConfig at all)
- No match == There is no module matched so Base LoraConfig is used.
This single control replaces all existing PEFT control with same functionally while allowing ALL properties to be dynamically overriden (if necessary) without any additional apis/LoraConfig vars. As it exists, you need to add code and logic to every LoraConfig property that participates in dynamic override/control.
Basically I want Peft LoraConfig to the clean standard for vLLM and SGLang when it comes to dynamic control. Having a unified dynamic
override system makes everyone's life so much easier and at the same time eliminate the issue that we have to write code each time a new LoraConfig property comes into pace.
Let me know what you think. I am willing to spend time working on it. You can also reach me at qubitium@modelcloud.ai and on X: qubitium. I really would love to chat with you for like 15 minutes to ping-pong this idea with you.
CC: @SunMarc @MekkCyber