You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a very particular problem where I have matrices where I know that certain subsets are zeros and thus, I should be able to safe some work by skipping these sections. In particular, I'm currently looking at the blasfeo_dtrsm_rltn and blasfeo_dsyrk_ln kernel.
Let's assume a have a dense matrix $$A\in\mathbb{R}^{n \times n}$$, and a matrix $$B\in\mathbb{R}^{m \times n}$$ where B has the following structure
B = [ 0 ... 0 | b ... b ]
[ 0 ... 0 | b ... b ]
[ ... | ... ]
[ 0 ... 0 | b ... b ]
[ ------- | ------- ]
[ 0 ... 0 | 0 ... 0 ]
[ ... | ... ]
[ 0 ... 0 | 0 ... 0 ]
i.e., only the top right block of $$B$$ has values and the rest are zeros. If I then run the blasfeo_dtrsm_rltn kernel to calculate C = B * A^{-T}, where $$C\in\mathbb{R}^{m \times n}$$ will have the following structure:
C = [ c ... c ]
[ c ... c ]
[ ... ]
[ c ... c ]
[ ------- ]
[ 0 ... 0 ]
[ ... ]
[ 0 ... 0 ]
As a final step, I then run the blasfeo_dsyrk_ln kernel to calculate E = D - C * C^T, where $$D,E\in\mathbb{R}^{m \times m}$$ are again fully dense. But since
C * C^T = [ c ... c | 0 ... 0 ]
[ c ... c | 0 ... 0 ]
[ ... | ... ]
[ c ... c | 0 ... 0 ]
[ ------- | ------- ]
[ 0 ... 0 | 0 ... 0 ]
[ ... | ... ]
[ 0 ... 0 | 0 ... 0 ]
there is again a bunch of work we can skip.
Now, I was looking at the source code of blasfeo, and it looks like I would need to implement my own custom kernels to achieve this. I think, I can pass a smaller m parameter to blasfeo_dtrsm_rltn to at least skip the work from the lower block in B, but I'm of course still multiplying the left block of zeros. I could do the same for the blasfeo_dtrsm_rltn to only calculate the upper left block of C * C^T, but I would need to split it into two operations since I'm subtracting it from D, i.e.
E = D
E = E - C * C^T on the upper left submatrix
Would there be another way to achieve these optimizations, or do I have to resort to custom kernels if I want to squeeze the last drop of performance out of these operations.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi,
I have a very particular problem where I have matrices where I know that certain subsets are zeros and thus, I should be able to safe some work by skipping these sections. In particular, I'm currently looking at the
blasfeo_dtrsm_rltn
andblasfeo_dsyrk_ln
kernel.Let's assume a have a dense matrix$$A\in\mathbb{R}^{n \times n}$$ , and a matrix $$B\in\mathbb{R}^{m \times n}$$ where B has the following structure
i.e., only the top right block of$$B$$ has values and the rest are zeros. If I then run the $$C\in\mathbb{R}^{m \times n}$$ will have the following structure:
blasfeo_dtrsm_rltn
kernel to calculateC = B * A^{-T}
, whereAs a final step, I then run the$$D,E\in\mathbb{R}^{m \times m}$$ are again fully dense. But since
blasfeo_dsyrk_ln
kernel to calculateE = D - C * C^T
, wherethere is again a bunch of work we can skip.
Now, I was looking at the source code of blasfeo, and it looks like I would need to implement my own custom kernels to achieve this. I think, I can pass a smaller
m
parameter toblasfeo_dtrsm_rltn
to at least skip the work from the lower block in B, but I'm of course still multiplying the left block of zeros. I could do the same for theblasfeo_dtrsm_rltn
to only calculate the upper left block ofC * C^T
, but I would need to split it into two operations since I'm subtracting it from D, i.e.Would there be another way to achieve these optimizations, or do I have to resort to custom kernels if I want to squeeze the last drop of performance out of these operations.
Beta Was this translation helpful? Give feedback.
All reactions