sequence_parallel_attention makes model _update_causal_mask go wrong way ?

### Reminder

- [x] I have read the README and searched the existing issues.

### System Info

the sp branch code,when transformers version > 4.51.0, sequence_parallel_attention will be registered.but the forward func,_update_causal_mask has a judge like "if self.config._attn_implementation == "flash_attention_2":"。finally，attention mask changes from 2d to 4d，i think it is a bug，can you help me ?

### Reproduction

just look look code,it is ok

### Expected behavior

_No response_

### Others

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

sequence_parallel_attention makes model _update_causal_mask go wrong way ? #75

Reminder

System Info

Reproduction

Expected behavior

Others

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

sequence_parallel_attention makes model _update_causal_mask go wrong way ? #75

Description

Reminder

System Info

Reproduction

Expected behavior

Others

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions