Skip to content

[QST] Can synchronized TensorCore MMA operations overlap with CUDA Core operations in a single thread? #1821

@phantaurus

Description

@phantaurus

What is your question?
Hello,

I am curious whether synchronized TensorCore operations like mma.sync.aligned.m16n8k16.row.col.f16.f16.f16.f16 can run in parallel with non-TensorCore operations such as hexp2 within the same thread, assuming there is no data dependency between them.

Given that these operations utilize different execution pipelines, it seems they should be able to overlap if no data dependencies exist. However, my experimental results suggest otherwise. It seems that they are unable to be parallelized if both are called within one thread.

Thank you so much!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions