What is your question?
Hello,
I am curious whether synchronized TensorCore operations like mma.sync.aligned.m16n8k16.row.col.f16.f16.f16.f16 can run in parallel with non-TensorCore operations such as hexp2 within the same thread, assuming there is no data dependency between them.
Given that these operations utilize different execution pipelines, it seems they should be able to overlap if no data dependencies exist. However, my experimental results suggest otherwise. It seems that they are unable to be parallelized if both are called within one thread.
Thank you so much!