-
Notifications
You must be signed in to change notification settings - Fork 69
Open
Description
Hi, thanks a lot for your great work!
I have a question, in the Double DQN
, maybe the following code needs a stop_gradient
?
target_q = rewards + (gamma*double_q * (1-terminal_flags))
The double_q
is from the target DQN. And when updating the main DQN, the error will back propagated to the target DQN if we don't stop the flow, right? So do we need to stop the gradient as follows?
target_q = tf.stop_gradient(target_q)
Could you please give some advice? Thanks.
Metadata
Metadata
Assignees
Labels
No labels