Skip to content

Hello, some questions about transformers and flash-attn, can you give me some advice? #8

@lch-github123

Description

@lch-github123

the requirements need the transformers <4.43, but it comes up with the problem ValueError: rope_scaling must be a dictionary with two fields of llama3 model, if the transformers>=4.43, I will come across the problem of flash-attn
[rank0]: RuntimeError: CUDA error: an illegal memory access was encountered
[rank0]: CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
[rank0]: For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
[rank0]: Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions