[Bug]: 使用AutoModelForCausalLM加载0.5B的PP-UIE要占用56G  NPU显存

### 软件环境

```Markdown
- paddlepaddle:3.2.0
- paddle-custom-npu:3.2.0
- paddlenlp: 3.0.0b4

-NPU: 910B
-服务器：昇腾
```

### 重复问题

- [x] I have searched the existing issues

### 错误描述

```Markdown
在一张空的910BNPU上使用AutoModelForCausalLM加载0.5B的PP-UIE要占用56G  NPU显存，
再次使用相同的脚本加载0.5B的PP-UIE则只占用4G显存（因为一张910B出去56G后只剩下这么多了）却也能正常推理。
```

### 稳定复现步骤 & 代码

from paddlenlp.transformers import AutoModelForCausalLM
from paddlenlp.transformers import AutoTokenizer
from paddlenlp.generation import GenerationConfig
from paddlenlp.trl import llm_utils

model_id = "paddlenlp/PP-UIE-0.5B"

model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention=False)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_id, padding_side="left")
generation_config = GenerationConfig.from_pretrained(model_id)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: 使用AutoModelForCausalLM加载0.5B的PP-UIE要占用56G NPU显存 #11094

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Bug]: 使用AutoModelForCausalLM加载0.5B的PP-UIE要占用56G NPU显存 #11094

Description

软件环境

重复问题

错误描述

稳定复现步骤 & 代码

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions