Skipping work by working on only sub-matrices #191

RSchwan · 2025-02-07T10:56:50Z

RSchwan
Feb 7, 2025

Hi,

I have a very particular problem where I have matrices where I know that certain subsets are zeros and thus, I should be able to safe some work by skipping these sections. In particular, I'm currently looking at the blasfeo_dtrsm_rltn and blasfeo_dsyrk_ln kernel.

Let's assume a have a dense matrix $$A\in\mathbb{R}^{n \times n}$$, and a matrix $$B\in\mathbb{R}^{m \times n}$$ where B has the following structure

B = [ 0 ... 0 | b ... b ]
    [ 0 ... 0 | b ... b ]
    [   ...   |   ...   ]
    [ 0 ... 0 | b ... b ]
    [ ------- | ------- ]
    [ 0 ... 0 | 0 ... 0 ]
    [   ...   |   ...   ]
    [ 0 ... 0 | 0 ... 0 ]

i.e., only the top right block of $$B$$ has values and the rest are zeros. If I then run the blasfeo_dtrsm_rltn kernel to calculate C = B * A^{-T}, where $$C\in\mathbb{R}^{m \times n}$$ will have the following structure:

C = [ c ... c ]
    [ c ... c ]
    [   ...   ]
    [ c ... c ]
    [ ------- ]
    [ 0 ... 0 ]
    [   ...   ]
    [ 0 ... 0 ]

As a final step, I then run the blasfeo_dsyrk_ln kernel to calculate E = D - C * C^T, where $$D,E\in\mathbb{R}^{m \times m}$$ are again fully dense. But since

C * C^T = [ c ... c | 0 ... 0 ]
          [ c ... c | 0 ... 0 ]
          [   ...   |   ...   ]
          [ c ... c | 0 ... 0 ]
          [ ------- | ------- ]
          [ 0 ... 0 | 0 ... 0 ]
          [   ...   |   ...   ]
          [ 0 ... 0 | 0 ... 0 ]

there is again a bunch of work we can skip.

Now, I was looking at the source code of blasfeo, and it looks like I would need to implement my own custom kernels to achieve this. I think, I can pass a smaller m parameter to blasfeo_dtrsm_rltn to at least skip the work from the lower block in B, but I'm of course still multiplying the left block of zeros. I could do the same for the blasfeo_dtrsm_rltn to only calculate the upper left block of C * C^T, but I would need to split it into two operations since I'm subtracting it from D, i.e.

E = D
E = E - C * C^T on the upper left submatrix

Would there be another way to achieve these optimizations, or do I have to resort to custom kernels if I want to squeeze the last drop of performance out of these operations.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Skipping work by working on only sub-matrices #191

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Skipping work by working on only sub-matrices #191

Uh oh!

RSchwan Feb 7, 2025

Replies: 0 comments

RSchwan
Feb 7, 2025