Skip to content

[question] Trying to make a Uno Env #1199

@saledemon

Description

@saledemon

Hi, I'm trying to create a Uno Custom Environment with stable baselines3. There are two or three problems I come across for which I'd like to have suggestions or ideas to go around them or straight up easy way to do it:

My version of Uno is simplified for now (only 0-9 cards of 4 colours). The player either plays a valid card or picks a new one. I use a MultiDiscrete action space for this.

self.action_space = spaces.MultiDiscrete([2, 40])

Two stands for 2 possible actions (play a card or draw one). Now I put 40 because there are 40 possible different cards you can play in Uno. Now, in the step(action)function, it makes it really hard for the algorithm to learn because it generates actions that are invalid (whether the card is not in the agents hand or the card can't be played according to Uno's rules).

I also tried instead to have the ai choose an index of the cards in his hand. The thing is that the hand size varies during the game and I don't think we can vary the action space from MultiDiscrete([2, 7])to MultiDiscrete([2, 9]) during the game.

What I would like to do instead is to filter the possible actions prior to generate the action for the step function and feed that list instead to the algorithm. Is that possible? So that in the step function, I only receive actions that are valid.

Here is my step function. Let me know if you need more code:
` def step(self, action: ActType):

    self.render()

    card_to_play = UnoCard.decode_from_int(action[1])

    if not (card_to_play.code in [c.code for c in self.agent.cards]):
        self.cards_not_in_hand += 1
        return self.get_observations(), CARD_NOT_IN_HAND_REWARD, True, False, {}
    elif not self.uno_game.is_play_valid(card_to_play):
        self.illegal_moves += 1
        return self.get_observations(), ILLEGAL_MOVE_REWARD, True, False, {}

    self.uno_game.play(action[0], card_to_play)

    if self.uno_game.is_game_over():
        self.wins += 1
        return self.get_observations(), WIN_REWARD, True, False, {}

    # second player plays
    act_type, card = self.uno_game.get_current_player().choose_action(self.uno_game)
    self.uno_game.play(act_type, card)

    if self.uno_game.is_game_over():
        self.losses += 1
        return self.get_observations(), LOSS_REWARD, True, False, {}

    self.valid_moves += 1
    return self.get_observations(), VALID_MOVE_REWARD, False, False, {}`

Let me know how you would approach this. Maybe there is another way entirely?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions