|
| 1 | +<?xml version='1.0' encoding='utf-8' standalone='no'?> |
| 2 | +<!DOCTYPE issue SYSTEM "lwg-issue.dtd"> |
| 3 | + |
| 4 | +<issue num="4137" status="New"> |
| 5 | +<title>Fix <i>Mandates</i>, <i>Preconditions</i>, and <i>Complexity</i> elements of [linalg] algorithms</title> |
| 6 | +<section><sref ref="[linalg.algs.blas2]"/><sref ref="[linalg.algs.blas3]"/></section> |
| 7 | +<submitter>Mark Hoemmen</submitter> |
| 8 | +<date>08 Aug 2024</date> |
| 9 | +<priority>99</priority> |
| 10 | + |
| 11 | +<discussion> |
| 12 | +<p> |
| 13 | +As <a href="https://github.com/ORNL/cpp-proposals-pub/issues/464">pointed out by Raffaele Solcà</a> |
| 14 | +(CSCS Swiss National Supercomputing Centre), some of the <i>Mandates</i>, <i>Preconditions</i>, and |
| 15 | +<i>Complexity</i> elements of some BLAS 2 and BLAS 3 algorithms in [linalg] are incorrect. |
| 16 | +</p> |
| 17 | +</discussion> |
| 18 | + |
| 19 | +<resolution> |
| 20 | +<p> |
| 21 | +This wording is relative to <paper num="N4988"/>. |
| 22 | +</p> |
| 23 | + |
| 24 | +<ol> |
| 25 | + |
| 26 | +<li><p>Modify <sref ref="[linalg.algs.blas2.gemv]"/> as indicated:</p> |
| 27 | + |
| 28 | +<blockquote class="note"> |
| 29 | +<p> |
| 30 | +[<i>Drafting note</i>: This change is needed because the matrix <tt>A</tt> does not need to be square. |
| 31 | +<tt>x.extents(0)</tt> must equal <tt>A.extents(1)</tt>, while <tt>y.extents(0)</tt> must equal |
| 32 | +<tt>A.extents(0)</tt>.] |
| 33 | +</p> |
| 34 | +</blockquote> |
| 35 | + |
| 36 | +<blockquote> |
| 37 | +<p> |
| 38 | +-3- <i>Mandates</i>: |
| 39 | +</p> |
| 40 | +<ol style="list-style-type: none"> |
| 41 | +<li><p>(3.1) — <tt><i>possibly-multipliable</i><decltype(A), decltype(x), decltype(y)>()</tt> |
| 42 | +is <tt>true</tt>, and</p></li> |
| 43 | +<li><p>(3.2) — <tt><i>possibly-addable</i><decltype(<ins>y</ins><del>x</del>), decltype(y), |
| 44 | +decltype(z)>()</tt> is <tt>true</tt> for those overloads that take a <tt>z</tt> parameter.</p></li> |
| 45 | +</ol> |
| 46 | +<p> |
| 47 | +-4- <i>Preconditions</i>: |
| 48 | +</p> |
| 49 | +<ol style="list-style-type: none"> |
| 50 | +<li><p>(4.1) — <tt><i>multipliable</i>(A, x, y)</tt> is <tt>true</tt>, and</p></li> |
| 51 | +<li><p>(4.2) — <tt><i>addable</i>(<ins>y</ins><del>x</del>, y, z)</tt> is <tt>true</tt> |
| 52 | +for those overloads that take a <tt>z</tt> parameter.</p></li> |
| 53 | +</ol> |
| 54 | +<p> |
| 55 | +-5- <i>Complexity</i>: 𝒪(<tt><ins>A</ins><del>x</del>.extent(0)</tt> × |
| 56 | +<tt><ins>x</ins><del>A</del>.extent(<ins>0</ins><del>1</del>)</tt>). |
| 57 | +</p> |
| 58 | +</blockquote> |
| 59 | + |
| 60 | +</li> |
| 61 | + |
| 62 | +<li><p>Modify <sref ref="[linalg.algs.blas2.symv]"/> as indicated:</p> |
| 63 | + |
| 64 | +<blockquote> |
| 65 | +<p> |
| 66 | +-3- <i>Mandates</i>: |
| 67 | +</p> |
| 68 | +<ol style="list-style-type: none"> |
| 69 | +<li><p>(3.1) — […]</p></li> |
| 70 | +<li><p>(3.2) — […]</p></li> |
| 71 | +<li><p>(3.3) — <tt><i>possibly-multipliable</i><decltype(A), decltype(x), decltype(y)>()</tt> |
| 72 | +is <tt>true</tt>, and</p></li> |
| 73 | +<li><p>(3.4) — <tt><i>possibly-addable</i><decltype(<ins>y</ins><del>x</del>), decltype(y), |
| 74 | +decltype(z)>()</tt> is <tt>true</tt> for those overloads that take a <tt>z</tt> parameter.</p></li> |
| 75 | +</ol> |
| 76 | +<p> |
| 77 | +-4- <i>Preconditions</i>: |
| 78 | +</p> |
| 79 | +<ol style="list-style-type: none"> |
| 80 | +<li><p>(4.1) — <tt>A.extent(0)</tt> equals <tt>A.extent(1)</tt>,</p></li> |
| 81 | +<li><p>(4.2) — <tt><i>multipliable</i>(A, x, y)</tt> is <tt>true</tt>, and</p></li> |
| 82 | +<li><p>(4.3) — <tt><i>addable</i>(<ins>y</ins><del>x</del>, y, z)</tt> is <tt>true</tt> |
| 83 | +for those overloads that take a <tt>z</tt> parameter.</p></li> |
| 84 | +</ol> |
| 85 | +<p> |
| 86 | +-5- <i>Complexity</i>: 𝒪(<tt><ins>A</ins><del>x</del>.extent(0)</tt> × |
| 87 | +<tt><ins>x</ins><del>A</del>.extent(<ins>0</ins><del>1</del>)</tt>). |
| 88 | +</p> |
| 89 | +</blockquote> |
| 90 | + |
| 91 | +</li> |
| 92 | + |
| 93 | +<li><p>Modify <sref ref="[linalg.algs.blas2.hemv]"/> as indicated:</p> |
| 94 | + |
| 95 | +<blockquote> |
| 96 | +<p> |
| 97 | +-3- <i>Mandates</i>: |
| 98 | +</p> |
| 99 | +<ol style="list-style-type: none"> |
| 100 | +<li><p>(3.1) — […]</p></li> |
| 101 | +<li><p>(3.2) — […]</p></li> |
| 102 | +<li><p>(3.3) — <tt><i>possibly-multipliable</i><decltype(A), decltype(x), decltype(y)>()</tt> |
| 103 | +is <tt>true</tt>, and</p></li> |
| 104 | +<li><p>(3.4) — <tt><i>possibly-addable</i><decltype(<ins>y</ins><del>x</del>), decltype(y), |
| 105 | +decltype(z)>()</tt> is <tt>true</tt> for those overloads that take a <tt>z</tt> parameter.</p></li> |
| 106 | +</ol> |
| 107 | +<p> |
| 108 | +-4- <i>Preconditions</i>: |
| 109 | +</p> |
| 110 | +<ol style="list-style-type: none"> |
| 111 | +<li><p>(4.1) — <tt>A.extent(0)</tt> equals <tt>A.extent(1)</tt>,</p></li> |
| 112 | +<li><p>(4.2) — <tt><i>multipliable</i>(A, x, y)</tt> is <tt>true</tt>, and</p></li> |
| 113 | +<li><p>(4.3) — <tt><i>addable</i>(<ins>y</ins><del>x</del>, y, z)</tt> is <tt>true</tt> |
| 114 | +for those overloads that take a <tt>z</tt> parameter.</p></li> |
| 115 | +</ol> |
| 116 | +<p> |
| 117 | +-5- <i>Complexity</i>: 𝒪(<tt><ins>A</ins><del>x</del>.extent(0)</tt> × |
| 118 | +<tt><ins>x</ins><del>A</del>.extent(<ins>0</ins><del>1</del>)</tt>). |
| 119 | +</p> |
| 120 | +</blockquote> |
| 121 | + |
| 122 | +</li> |
| 123 | + |
| 124 | +<li><p>Modify <sref ref="[linalg.algs.blas2.trmv]"/> as indicated:</p> |
| 125 | + |
| 126 | +<blockquote class="note"> |
| 127 | +<p> |
| 128 | +[<i>Drafting note</i>: The extents compatibility conditions are expressed differently than in the |
| 129 | +above matrix-vector multiply sections, perhaps more for consistency with the TRSV section below. |
| 130 | +They look correct here. The original <i>Complexity</i> elements adjusted below are technically correct, |
| 131 | +since <math><mi>A</mi></math> is square, but changing this would improve consistency with |
| 132 | +<sref ref="[linalg.algs.blas2.gemv]"/>] |
| 133 | +</p> |
| 134 | +</blockquote> |
| 135 | + |
| 136 | +<blockquote> |
| 137 | +<pre> |
| 138 | +template<<i>in-matrix</i> InMat, class Triangle, class DiagonalStorage, <i>in-vector</i> InVec, |
| 139 | + <i>out-vector</i> OutVec> |
| 140 | + void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InVec x, OutVec y); |
| 141 | +template<class ExecutionPolicy, |
| 142 | + <i>in-matrix</i> InMat, class Triangle, class DiagonalStorage, <i>in-vector</i> InVec, |
| 143 | + <i>out-vector</i> OutVec> |
| 144 | + void triangular_matrix_vector_product(ExecutionPolicy&& exec, |
| 145 | + InMat A, Triangle t, DiagonalStorage d, InVec x, OutVec y); |
| 146 | +</pre> |
| 147 | +<blockquote> |
| 148 | +<p> |
| 149 | +-5- […] |
| 150 | +<p/> |
| 151 | +-6- <i>Effects</i>: Computes <math><mi>y</mi> <mo>=</mo> <mi>A</mi><mi>x</mi></math>. |
| 152 | +<p/> |
| 153 | +-5- <i>Complexity</i>: 𝒪(<tt><ins>A</ins><del>x</del>.extent(0)</tt> × |
| 154 | +<tt><ins>x</ins><del>A</del>.extent(<ins>0</ins><del>1</del>)</tt>). |
| 155 | +</p> |
| 156 | +</blockquote> |
| 157 | +<pre> |
| 158 | +template<<i>in-matrix</i> InMat, class Triangle, class DiagonalStorage, <i>inout-vector</i> InOutVec> |
| 159 | + void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, InOutVec y); |
| 160 | +template<class ExecutionPolicy, |
| 161 | + <i>in-matrix</i> InMat, class Triangle, class DiagonalStorage, <i>inout-vector</i> InOutVec> |
| 162 | + void triangular_matrix_vector_product(ExecutionPolicy&& exec, |
| 163 | + InMat A, Triangle t, DiagonalStorage d, InOutVec y); |
| 164 | +</pre> |
| 165 | +<blockquote> |
| 166 | +<p> |
| 167 | +-8- […] |
| 168 | +<p/> |
| 169 | +-9- <i>Effects</i>: […] |
| 170 | +<p/> |
| 171 | +-10- <i>Complexity</i>: 𝒪(<tt><ins>A</ins><del>y</del>.extent(0)</tt> × |
| 172 | +<tt><ins>y</ins><del>A</del>.extent(<ins>0</ins><del>1</del>)</tt>). |
| 173 | +</p> |
| 174 | +</blockquote> |
| 175 | +<pre> |
| 176 | +template<<i>in-matrix</i> InMat, class Triangle, class DiagonalStorage, |
| 177 | + <i>in-vector</i> InVec1, <i>in-vector</i> InVec2, <i>out-vector</i> OutVec> |
| 178 | + void triangular_matrix_vector_product(InMat A, Triangle t, DiagonalStorage d, |
| 179 | + InVec1 x, InVec2 y, OutVec z); |
| 180 | +template<class ExecutionPolicy, |
| 181 | + <i>in-matrix</i> InMat, class Triangle, class DiagonalStorage, |
| 182 | + <i>in-vector</i> InVec1, <i>in-vector</i> InVec2, <i>out-vector</i> OutVec> |
| 183 | + void triangular_matrix_vector_product(ExecutionPolicy&& exec, |
| 184 | + InMat A, Triangle t, DiagonalStorage d, |
| 185 | + InVec1 x, InVec2 y, OutVec z); |
| 186 | +</pre> |
| 187 | +<blockquote> |
| 188 | +<p> |
| 189 | +-11- […] |
| 190 | +<p/> |
| 191 | +-12- <i>Effects</i>: Computes <math><mi>z</mi> <mo>=</mo> <mi>y</mi> <mo>+</mo> <mi>A</mi><mi>x</mi></math>. |
| 192 | +<p/> |
| 193 | +-13- <i>Complexity</i>: 𝒪(<tt><ins>A</ins><del>x</del>.extent(0)</tt> × |
| 194 | +<tt><ins>x</ins><del>A</del>.extent(<ins>0</ins><del>1</del>)</tt>). |
| 195 | +</p> |
| 196 | +</blockquote> |
| 197 | +</blockquote> |
| 198 | + |
| 199 | +</li> |
| 200 | + |
| 201 | +<li><p>Modify <sref ref="[linalg.algs.blas3.rankk]"/> as indicated:</p> |
| 202 | + |
| 203 | +<blockquote class="note"> |
| 204 | +<p> |
| 205 | +[<i>Drafting note</i>: <paper num="P3371R0"/>, to be submitted in the August 15 mailing for |
| 206 | +LEWG review, contains the same wording changes to <sref ref="[linalg.algs.blas3.rankk]"/> |
| 207 | +and <sref ref="[linalg.algs.blas3.rank2k]"/> as proposed here, with additional changes |
| 208 | +corresponding to that proposal. Please apply this LWG issue's changes first, before P3371 merges] |
| 209 | +</p> |
| 210 | +</blockquote> |
| 211 | + |
| 212 | +<blockquote> |
| 213 | +<p> |
| 214 | +-3- <i>Mandates</i>: |
| 215 | +</p> |
| 216 | +<ol style="list-style-type: none"> |
| 217 | +<li><p>(3.1) — If <tt>InOutMat</tt> has <tt>layout_blas_packed</tt> layout, then the |
| 218 | +layout's <tt>Triangle</tt> template argument has the same type as the function's |
| 219 | +<tt>Triangle</tt> template argument; <ins>and</ins></p></li> |
| 220 | +<li><p>(3.2) — <tt><ins><i>possibly-multipliable</i><decltype(A), |
| 221 | +decltype(transposed(A)), decltype(C)></ins> <del><i>compatible-static-extents</i><decltype(A), |
| 222 | +decltype(A)>(0, 1)</del></tt> is <tt>true</tt><ins>.</ins><del>;</del></p></li> |
| 223 | +<li><p><del>(3.3) — <tt><i>compatible-static-extents</i><decltype(C), decltype(C)>(0, 1)</tt> |
| 224 | +is <tt>true</tt>; and</del></p></li> |
| 225 | +<li><p><del>(3.4) — <tt><i>compatible-static-extents</i><decltype(A), decltype(C)>(0, 0)</tt> |
| 226 | +is <tt>true</tt>.</del></p></li> |
| 227 | +</ol> |
| 228 | +<p> |
| 229 | +-4- <i>Preconditions</i>: <ins><tt><i>multipliable</i>(A, transposed(A), C)</tt> is <tt>true</tt>.</ins> |
| 230 | +</p> |
| 231 | +<ol style="list-style-type: none"> |
| 232 | +<li><p><del>(4.1) — <tt>A.extent(0)</tt> equals <tt>A.extent(1)</tt>,</del></p></li> |
| 233 | +<li><p><del>(4.2) — <tt>C.extent(0)</tt> equals <tt>C.extent(1)</tt>, and</del></p></li> |
| 234 | +<li><p><del>(4.3) — <tt>A.extent(0)</tt> equals <tt>C.extent(0)</tt>.</del></p></li> |
| 235 | +</ol> |
| 236 | +<p> |
| 237 | +-5- <i>Complexity</i>: 𝒪(<tt>A.extent(0)</tt> × <tt>A.extent(1)</tt> × <tt><ins>A</ins><del>C</del>.extent(0)</tt>). |
| 238 | +</p> |
| 239 | +</blockquote> |
| 240 | + |
| 241 | +</li> |
| 242 | + |
| 243 | +<li><p>Modify <sref ref="[linalg.algs.blas3.rank2k]"/> as indicated:</p> |
| 244 | + |
| 245 | +<blockquote> |
| 246 | +<p> |
| 247 | +-3- <i>Mandates</i>: |
| 248 | +</p> |
| 249 | +<ol style="list-style-type: none"> |
| 250 | +<li><p>(3.1) — If <tt>InOutMat</tt> has <tt>layout_blas_packed</tt> layout, then the |
| 251 | +layout's <tt>Triangle</tt> template argument has the same type as the function's |
| 252 | +<tt>Triangle</tt> template argument;</p></li> |
| 253 | +<li><p>(3.2) — <tt><ins><i>possibly-multipliable</i><decltype(A), |
| 254 | +decltype(transposed(B)), decltype(C)>()</ins> <del><i>possibly-addable</i><decltype(A), |
| 255 | +decltype(B), decltype(C)>()</del></tt> |
| 256 | +is <tt>true</tt>; and</p></li> |
| 257 | +<li><p>(3.3) — <tt><ins><i>possibly-multipliable</i><decltype(B), |
| 258 | +decltype(transposed(A)), decltype(C)>(0, 1)</ins> <del><i>compatible-static-extents</i><decltype(A), |
| 259 | +decltype(A)>(0, 1)</del></tt> is <tt>true</tt>.</p></li> |
| 260 | +</ol> |
| 261 | +<p> |
| 262 | +-4- <i>Preconditions</i>: |
| 263 | +</p> |
| 264 | +<ol style="list-style-type: none"> |
| 265 | +<li><p>(4.1) — <tt><ins><i>multipliable</i>(A, transposed(B), C)</ins> |
| 266 | +<del><i>addable</i>(A, B, C)</del></tt> is <tt>true</tt>, and</p></li> |
| 267 | +<li><p>(4.2) — <ins><tt><i>multipliable</i>(B, transposed(A), C)</tt> is <tt>true</tt></ins> |
| 268 | +<del><tt>A.extent(0)</tt> equals <tt>A.extent(1)</tt></del>.</p></li> |
| 269 | +</ol> |
| 270 | +<p> |
| 271 | +-5- <i>Complexity</i>: 𝒪(<tt>A.extent(0)</tt> × <tt>A.extent(1)</tt> × <tt><ins>B</ins><del>C</del>.extent(0)</tt>). |
| 272 | +</p> |
| 273 | +</blockquote> |
| 274 | + |
| 275 | +</li> |
| 276 | + |
| 277 | +<li><p>Modify <sref ref="[linalg.algs.blas3.trsm]"/> as indicated:</p> |
| 278 | + |
| 279 | +<blockquote class="note"> |
| 280 | +<p> |
| 281 | +[<i>Drafting note</i>: Nothing is wrong here, but it's nice to make the complexity clauses depend |
| 282 | +only on input if possible] |
| 283 | +</p> |
| 284 | +</blockquote> |
| 285 | + |
| 286 | +<blockquote> |
| 287 | +<pre> |
| 288 | +template<<i>in-matrix</i> InMat1, class Triangle, class DiagonalStorage, |
| 289 | + <i>in-matrix</i> InMat2, <i>out-matrix</i> OutMat, class BinaryDivideOp> |
| 290 | + void triangular_matrix_matrix_left_solve(InMat1 A, Triangle t, DiagonalStorage d, |
| 291 | + InMat2 B, OutMat X, BinaryDivideOp divide); |
| 292 | +template<class ExecutionPolicy, |
| 293 | + <i>in-matrix</i> InMat1, class Triangle, class DiagonalStorage, |
| 294 | + <i>in-matrix</i> InMat2, <i>out-matrix</i> OutMat, class BinaryDivideOp> |
| 295 | + void triangular_matrix_matrix_left_solve(ExecutionPolicy&& exec, |
| 296 | + InMat1 A, Triangle t, DiagonalStorage d, |
| 297 | + InMat2 B, OutMat X, BinaryDivideOp divide); |
| 298 | +</pre> |
| 299 | +<blockquote> |
| 300 | +<p> |
| 301 | +[…] |
| 302 | +<p/> |
| 303 | +-6- <i>Complexity</i>: 𝒪(<tt>A.extent(0)</tt> × <tt><ins>B</ins><del>X</del>.extent(1)</tt> × <tt><ins>B</ins><del>X</del>.extent(1)</tt>). |
| 304 | +</p> |
| 305 | +</blockquote> |
| 306 | + |
| 307 | +</blockquote> |
| 308 | + |
| 309 | +</li> |
| 310 | + |
| 311 | + |
| 312 | +<li><p>Modify <sref ref="[linalg.algs.blas3.inplacetrsm]"/> as indicated:</p> |
| 313 | + |
| 314 | +<blockquote class="note"> |
| 315 | +<p> |
| 316 | +[<i>Drafting note</i>: Nothing is wrong here, but it's nice to make the complexity clauses depend |
| 317 | +only on input if possible] |
| 318 | +</p> |
| 319 | +</blockquote> |
| 320 | + |
| 321 | +<blockquote> |
| 322 | +<pre> |
| 323 | +template<<i>in-matrix</i> InMat, class Triangle, class DiagonalStorage, |
| 324 | + <i>inout-matrix</i> InOutMat, class BinaryDivideOp> |
| 325 | + void triangular_matrix_matrix_right_solve(InMat A, Triangle t, DiagonalStorage d, |
| 326 | + InOutMat B, BinaryDivideOp divide); |
| 327 | +template<class ExecutionPolicy, |
| 328 | + <i>in-matrix</i> InMat, class Triangle, class DiagonalStorage, |
| 329 | + <i>inout-matrix</i> InOutMat, class BinaryDivideOp> |
| 330 | + void triangular_matrix_matrix_right_solve(ExecutionPolicy&& exec, |
| 331 | + InMat A, Triangle t, DiagonalStorage d, |
| 332 | + InOutMat B, BinaryDivideOp divide); |
| 333 | +</pre> |
| 334 | +<blockquote> |
| 335 | +<p> |
| 336 | +[…] |
| 337 | +<p/> |
| 338 | +-13- <i>Complexity</i>: 𝒪(<tt><ins>B</ins><del>A</del>.extent(0)</tt> × |
| 339 | +<tt>A.extent(<ins>0</ins><del>1</del>)</tt> × <tt><ins>A</ins><del>B</del>.extent(1)</tt>). |
| 340 | +</p> |
| 341 | +</blockquote> |
| 342 | + |
| 343 | +</blockquote> |
| 344 | + |
| 345 | +</li> |
| 346 | +</ol> |
| 347 | +</resolution> |
| 348 | + |
| 349 | +</issue> |
0 commit comments