Skip to content

I figured out how to cram GPT-2 1.5B onto a single TPU core with Adam optimizer #23

@shawwn

Description

@shawwn

It comes down to tensor shape. 2D = good, 3D = bad.

Relevant commit: shawwn/gpt-2@4d766e9

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions