File tree Expand file tree Collapse file tree 1 file changed +1
-1
lines changed
tutorials/sphinx-tutorials Expand file tree Collapse file tree 1 file changed +1
-1
lines changed Original file line number Diff line number Diff line change 57
57
#
58
58
# This type of algorithms is usually trained *on-policy*. This means that, at every learning iteration, we have a
59
59
# **sampling** and a **training** phase. In the **sampling** phase of iteration :math:`t`, rollouts are collected
60
- # form agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
60
+ # from agents' interactions in the environment using the current policies :math:`\mathbf{\pi}_t`.
61
61
# In the **training** phase, all the collected rollouts are immediately fed to the training process to perform
62
62
# backpropagation. This leads to updated policies which are then used again for sampling.
63
63
# The execution of this process in a loop constitutes *on-policy learning*.
You can’t perform that action at this time.
0 commit comments