Skip to content

多头注意力 #2

@Walterkd

Description

@Walterkd

README里说“DiT Block采用3头注意力”,这里应该是4头注意力吧?
train.py 里给的 head=4

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions