A summary which briefly touches upon Q-learning
Requires Python 3
$ pip install -r requirements.txt
$ python main.py
The training generates a csv with Q-table info of the form alpha_{alpha_value}_gamma_{gamma_value}_score_{score}__{timestamp}.csv
One is added in the examples folder of this repository.
$ python test.py <path-to-qtable.csv>
- The taxi agent generally gets trained with a best average reward of ~9 in 50k episodes.
- The taxi agent gathers upto 8.5+ average rewards in a test of 100 episodes.
+---------+
|R: | : :G|
| : : : : |
| : : : : |
| | : | : |
|Y| : |B: |
+---------+
- The map is a
5x5
gridworld. - The alphabets
R
,G
,B
,Y
are 4 locations. - A passenger can be at any of the 4 locations.
- A passenger's destination can be any of the left 3 locations.
- The pipe symbol
|
denotes a wall. - The colon symbol
:
denotes a pass. - The taxi can pass through
:
but not|
- The environment rewards
20
points when a passenger is dropped to their destination. - The environment penalizes
-10
points if pickup operation is performed on a cell where there is no passenger. - The environment penalizes
-10
points if drop operation is performed if no passenger had boarded the taxi. - The environment penalizes
-1
for every other action.
- At the start, the taxi will be at any of the 25 positions on the map (from
Environment description[1]
). - A passenger will be at any of
R
,G
,B
,Y
locations. - A destination will be at any of the
R
,G
,B
,Y
locations.
- The taxi must find the passenger traveling the shortest path.
- The taxi must pickup the passenger.
- The taxi must find the shortest path to the passenger's destination.
- Drop the passenger at their destination traversing the shortest path.
OpenAI Gym defines "solving" taxi-v1 task as getting average return of 9.7 over 100 consecutive trials.