Should `F.softmax(Q_targets_next, dim=1)` be `F.softmax(Q_targets_next / entropy_tau, dim=1)` instead?