Replies: 3 comments 1 reply
-
Looking at the code in It appears that the 0, 1 case optimization for varialbes is not also done for dynamic parameters. Do you think you could modify that azmul.hpp and do the zero / one optimization for dynamic paraemters (and make a corresponding pull request) ? If so, please first make sure that you can run the check_all.sh script on your stystem; see This may require some setup like installing xrst |
Beta Was this translation helpful? Give feedback.
-
Make sure you use the master branch and can run bin/check_all.sh before making any changes. |
Beta Was this translation helpful? Give feedback.
-
I made PR #232.. Note I describe that it is lacking tests and so likely isn't ready to merge, that
It does appear to me that the actual code tests ran. I am not sure. I also ran |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Hi @bradbell, we have had CppAD successfully deployed with nimble for a while now. Thanks for making this possible.
In
azmul.hpp
, lines 191-199 (the case where neither argument isvariable
and at least one isdynamic
), would it work to catch cases whenx
ory
isconstant
and exactly 0 or 1 and then avoid putting azmul_dyn
operator in the vector ofdynamic
parameter operators used bynew_dynamic
? In the preceding cases wherex
ory
isvariable
, there is careful catching of cases with the other argument a constant0
or1
to avoid recording unnecessary operations. However, am I following correctly that this doesn't happen for the case wherex
andy
are onedynamic
and oneconstant
?The same question applies to
add.hpp
(and possibly others?).I very well may not be following the internal details sufficiently.
Here is a summary of the use case that led me to this. (I can provide code but will try to explain.) Consider taping calculation of a quadratic form
y = x^T A x
wherex
is an n-by-1 matrix andA
is an n-by-n matrix.y
is a scalar. Everything is taped with your scalar multiplication and addition operators. Then consider obtaining the Hessian of this with respect to all elements ofx
(i.e., the correct answer is2A
) by triple taping: (i) tapey = x^T A x
; (ii) tape calculation of the Jacobian of (i) by playing (i) forward 0 and reverse 1; and (iii) tape calculation of the Hessian of (i) by playing (ii) forward 0 (once) and then reverse 1, n times, once each with a single element of w being 1 instead of 0. Finally,optimize()
the third tape.In this case, if elements of
A
are allconstant
, the result is ideal: The optimize()d third tape has successfully obtained the result as simply twice the elements ofA
, already calculated as constants, and virtually no computations are needed. (Before optimize(), the third tape still includes the forward 0 calculations, but optimization successfully detects that none of that is needed for the Hessian elements, which all result from multiplication and addition of onlyconstant
arguments.)However, if elements of
A
are alldynamic
(for all three levels of taping), then thenew_dynamic
operations in the third tape get very large at a fast rate in relation to the size ofA
. And they are all (or almost all) multiplications by 1 or additions of 0. Withn=500
, creating the third tape (before optimizing it) looks like it takes about 10Gb. Playing that tape takes at least an additional 10Gb. (And that is before optimizing it, which seems to shorten thenew_dynamic
operations somewhat but does not eliminate identity multiplications or additions).What I mean by using
dynamic
elements for all three levels of taping is that each tape include a dynamic vector that is used as the elements ofA
. When recording one tape by playing another tape, the playing is preceded by call tonew_dynamic
so that the dynamic vector for the new tape is used as the updated dynamic values of the tape that will be played.I was surprised that
constant
vsdynamic
would give such a huge difference in memory performance. Is there a way to make recording ofdynamic
operations catch cases where efficiency can be gained? Above, I tried to see where that might be done but might be off track.I can definitely work around this but wonder if it is feasible and/or sensible to improve.
Thank you.
Perry
Beta Was this translation helpful? Give feedback.
All reactions