-
Notifications
You must be signed in to change notification settings - Fork 727
Description
Hi, I'm trying to create a Uno Custom Environment with stable baselines3. There are two or three problems I come across for which I'd like to have suggestions or ideas to go around them or straight up easy way to do it:
My version of Uno is simplified for now (only 0-9 cards of 4 colours). The player either plays a valid card or picks a new one. I use a MultiDiscrete action space for this.
self.action_space = spaces.MultiDiscrete([2, 40])
Two stands for 2 possible actions (play a card or draw one). Now I put 40 because there are 40 possible different cards you can play in Uno. Now, in the step(action)
function, it makes it really hard for the algorithm to learn because it generates actions that are invalid (whether the card is not in the agents hand or the card can't be played according to Uno's rules).
I also tried instead to have the ai choose an index of the cards in his hand. The thing is that the hand size varies during the game and I don't think we can vary the action space from MultiDiscrete([2, 7])
to MultiDiscrete([2, 9])
during the game.
What I would like to do instead is to filter the possible actions prior to generate the action for the step function and feed that list instead to the algorithm. Is that possible? So that in the step function, I only receive actions that are valid.
Here is my step function. Let me know if you need more code:
` def step(self, action: ActType):
self.render()
card_to_play = UnoCard.decode_from_int(action[1])
if not (card_to_play.code in [c.code for c in self.agent.cards]):
self.cards_not_in_hand += 1
return self.get_observations(), CARD_NOT_IN_HAND_REWARD, True, False, {}
elif not self.uno_game.is_play_valid(card_to_play):
self.illegal_moves += 1
return self.get_observations(), ILLEGAL_MOVE_REWARD, True, False, {}
self.uno_game.play(action[0], card_to_play)
if self.uno_game.is_game_over():
self.wins += 1
return self.get_observations(), WIN_REWARD, True, False, {}
# second player plays
act_type, card = self.uno_game.get_current_player().choose_action(self.uno_game)
self.uno_game.play(act_type, card)
if self.uno_game.is_game_over():
self.losses += 1
return self.get_observations(), LOSS_REWARD, True, False, {}
self.valid_moves += 1
return self.get_observations(), VALID_MOVE_REWARD, False, False, {}`
Let me know how you would approach this. Maybe there is another way entirely?