This exercise explores Q-learning based on neural and table-based approaches.
- Take a look at the frozen-lake-documentation, and familiarize yourself with the setup. Recall the Q-Table update Rule we saw during the lecture or take a look at [1, equation (6.8) on page 153]:
Here
-
Navigate to
src/train_table_Q_frozen_lake.py
, resolve theTODO
s by choosing actions from a Q-Table and implementing the Q-Table update rule. -
Execute the script with
python src/train_table_Q_frozen_lake.py
. If you see your figure reach the goal in the short movie clip you are done.
Note: If you run into a libGL
error, this is usually caused by the conda installation. To fix this, execute the following script in your terminal:
cd /home/$USER/miniconda3/lib
mkdir backup # Create a new folder to keep the original libstdc++
mv libstd* backup # Put all libstdc++ files into the folder, including soft links
cp /usr/lib/x86_64-linux-gnu/libstdc++.so.6 ./ # Copy the c++ dynamic link library of the system here
ln -s libstdc++.so.6 libstdc++.so
ln -s libstdc++.so.6 libstdc++.so.6.0.19
You might have to adjust the miniconda lib path to your current environment, e.g. if you are using a custom environment named (aml)
adjust the first command to
cd /home/$USER/miniconda3/envs/aml/lib
Source: Stackoverflow
-
Let's consider a more challenging example next. Navigate to
src/train_table_Q_tictactoe.py
and finish theTicTacToeBoard
class ingen.py
. Usenox -s test
to check your progress. Whentests/test_board.py
checks out without an error, this task is done. -
Set up a Q-Table-based agent and train it. To do so open
src/train_table_Q_tictactoe.py
and start with the main routine. Use a Python dictionary as your data structure for the Q-Table. The dictionary will allow you to encode the board states as strings and access them eaisly withq_table[str(state)]
. Usecreate_explore_move
when sampling random moves from the set of allowed moves. -
Implement the
create_explore_move
andprocess_result
functions. Usetests/test_explore_move.py
to test your code. -
Recycle your code from the frozen lake example to implement Q-Table (or dict) learning. Implement learning by playing games against a random opponent. The opponent performs random moves all the time. Use your
create_explore_move
function for the opponent. -
Stop sampling from the Q-table every time for your agent. Sample the agent with probability
$\epsilon$ and perform an exploratory move with probability$1 - \epsilon$ . Remember to usejax.random.split
to generate new seeds forjax.random.uniform
.
- Let's replace the Q-table with a neural network [2, Algorithm 1]. For simplicity, we do not implement batching, this works here, but won't scale to harder problems. Re-use as much code from task two as possible. Open
src/train_neural_Q_tictactoe.py
. Yourboard_update
,create_explore_move
functions are already imported. Recall the cost function for neural Q-Learning:
with
into jax.grad
and update your weights using jax.tree_map(lambda w, g: w - alpha*g, weights, grads)
.
play.py
allows playing against the trained TicTacToe agents.
- [1] Sutton, Reinforcement Learning, https://www.andrew.cmu.edu/course/10-703/textbook/BartoSutton.pdf
- [2] Playing Atari with Deep Reinforcement Learning, https://arxiv.org/pdf/1312.5602.pdf
- [3] Pong from Pixels, https://karpathy.github.io/2016/05/31/rl/