-
What's the recommended approach for an operator that has a computed value from Forward() that is helpful in the Backward() calculation, given that the computed value is not an I/O of the operator? I'm familiar with the long standing approach of declaring a 'hidden output', but is that still the recommended approach? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 1 reply
-
Beta Was this translation helpful? Give feedback.
-
Another point to consider is how the operator behaves in an inference-only graph. Seems like with the stateful op approach, one can easily react to is_train == false, and not allocate the fwd->bwd Tensor if it's not needed otherwise. I'm not sure if the 'hidden output' approach can avoid allocating the space for that. |
Beta Was this translation helpful? Give feedback.
FStatefulCompute is designed for this purpose:
https://github.com/apache/incubator-mxnet/blob/527573ec2b9b2696ffcafd1570cd94e2187f4c32/src/operator/rnn.cu#L35
https://github.com/apache/incubator-mxnet/blob/4a8da9ec62e8cadd7df6ad5e9ba305b777104068/src/operator/rnn-inl.h#L1523