Skip to content

Error Using LLama-2 with Fine-Tuned LoRA Adapters: Tensor Size Mismatch in apply_rotary_pos_emb Function #147

@montygole

Description

@montygole

I encountered a runtime error while using the transformers-interpret library with a fine-tuned LLama-2 model that includes LoRA adapters for sequence classification. The error occurs when invoking the SequenceClassificationExplainer and seems related to tensor size mismatches during the rotary positional embedding application.

Traceback (most recent call last):
  File "/home/input_attr_proj/src/input_attr.py", line 32, in <module>
    word_attributions = cls_explainer("Hello")
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers_interpret/explainers/text/sequence_classification.py", line 316, in __call__
    return self._run(text, index, class_name, embedding_type=embedding_type)
File "/home/input_attr_env/lib/python3.10/site-packages/transformers_interpret/explainers/text/sequence_classification.py", line 270, in _run
    self._calculate_attributions(embeddings=embeddings, index=index, class_name=class_name)
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers_interpret/explainers/text/sequence_classification.py", line 226, in _calculate_attributions
    lig = LIGAttributions(
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers_interpret/attributions.py", line 51, in __init__
    self._attributions, self.delta = self.lig.attribute(
  File "/home/input_attr_env/lib/python3.10/site-packages/captum/log/__init__.py", line 42, in wrapper
    return func(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/captum/attr/_core/layer/layer_integrated_gradients.py", line 390, in attribute
    baselines_layer = _forward_layer_eval(
  File "/home/input_attr_env/lib/python3.10/site-packages/captum/_utils/gradient.py", line 182, in _forward_layer_eval
    return _forward_layer_eval_with_neuron_grads(
  File "/home/input_attr_env/lib/python3.10/site-packages/captum/_utils/gradient.py", line 445, in _forward_layer_eval_with_neuron_grads
    saved_layer = _forward_layer_distributed_eval(
  File "/home/input_attr_env/lib/python3.10/site-packages/captum/_utils/gradient.py", line 294, in _forward_layer_distributed_eval
    output = _run_forward(
  File "/home/input_attr_env/lib/python3.10/site-packages/captum/_utils/common.py", line 531, in _run_forward
    output = forward_func(
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers_interpret/explainers/text/sequence_classification.py", line 181, in _forward
    preds = self._get_preds(input_ids, token_type_ids, position_ids, attention_mask)
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers_interpret/explainer.py", line 197, in _get_preds
    preds = self.model(
  File "/home/input_attr_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/accelerate/hooks.py", line 166, in new_forward
    output = module._old_forward(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 1352, in forward
    transformer_outputs = self.model(
  File "/home/input_attr_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/input_attrlib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 968, in forward
    layer_outputs = decoder_layer(
  File "/home/input_attr_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 713, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/input_attr_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 624, in forward
    query_states, key_states = apply_rotary_pos_emb(query_states, key_states, cos, sin)
  File "/home/input_attr_env/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 182, in apply_rotary_pos_emb
    q_embed = (q * cos) + (rotate_half(q) * sin)
RuntimeError: The size of tensor a (3) must match the size of tensor b (2) at non-singleton dimension 2

Code sample:

from transformers import AutoTokenizer, AutoModelForSequenceClassification
from transformers_interpret import SequenceClassificationExplainer

id2label = {0: "No", 1: "Yes"}
label2id = {"No": 0, "Yes": 1}
model = AutoModelForSequenceClassification.from_pretrained("outputs/2024-04-21/04-27-20/outputs/checkpoint-2564/", device_map='auto',num_labels=2, id2label=id2label, label2id=label2id)
tokenizer = AutoTokenizer.from_pretrained(model_name)
tokenizer.pad_token_id = tokenizer.eos_token_id
tokenizer.pad_tioken = tokenizer.eos_token

cls_explainer = SequenceClassificationExplainer(model,tokenizer)
word_attributions = cls_explainer("Hello")
print(word_attributions)

Additional Context:

The error seems to occur in the apply_rotary_pos_emb function, indicating a tensor size mismatch. This might be due to the integration of LoRA adapters with the LLama-2 model. Any help to resolve this issue or guidance on proper compatibility would be greatly appreciated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions