Skip to content

Rough runtime benchmarks across tasks and context-lengths on any hardware setups #76

@girishbalaji

Description

@girishbalaji

Are there any rough numbers folks have on any hardaware setup for the inference runtime of any standard models across tasks and various context lengths?

Using vLLM, for just 5 requests for the 131072 context length for NIAH_single_1, I'm currently seeing ~15 minutes on a single a100 for Llama 3.1 8b.

While I actively play with parallelism configs, I'm wondering if other folks have any rough order of magnitude runtime changes, they have seen across various tasks and context lengths on any hardware setups

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions