[question] Trying to make a Uno Env

Hi, I'm trying to create a Uno Custom Environment with stable baselines3. There are two or three problems I come across for which I'd like to have suggestions or ideas to go around them or straight up easy way to do it:

My version of Uno is simplified for now (only 0-9 cards of 4 colours). The player either plays a valid card or picks a new one. I use a MultiDiscrete action space for this. 

`self.action_space = spaces.MultiDiscrete([2, 40])`

Two stands for 2 possible actions (play a card or draw one). Now I put 40 because there are 40 possible different cards you can play in Uno. Now, in the `step(action)`function, it makes it really hard for the algorithm to learn because it generates actions that are invalid (whether the card is not in the agents hand or the card can't be played according to Uno's rules).

I also tried instead to have the ai choose an index of the cards in his hand. The thing is that the hand size varies during the game and I don't think we can vary the action space from `MultiDiscrete([2, 7])`to `MultiDiscrete([2, 9])` during the game.

What I would like to do instead is to filter the possible actions prior to generate the action for the step function and feed that list instead to the algorithm. Is that possible? So that in the step function, I only receive actions that are valid. 

Here is my step function. Let me know if you need more code:
`    def step(self, action: ActType):

        self.render()

        card_to_play = UnoCard.decode_from_int(action[1])

        if not (card_to_play.code in [c.code for c in self.agent.cards]):
            self.cards_not_in_hand += 1
            return self.get_observations(), CARD_NOT_IN_HAND_REWARD, True, False, {}
        elif not self.uno_game.is_play_valid(card_to_play):
            self.illegal_moves += 1
            return self.get_observations(), ILLEGAL_MOVE_REWARD, True, False, {}

        self.uno_game.play(action[0], card_to_play)

        if self.uno_game.is_game_over():
            self.wins += 1
            return self.get_observations(), WIN_REWARD, True, False, {}

        # second player plays
        act_type, card = self.uno_game.get_current_player().choose_action(self.uno_game)
        self.uno_game.play(act_type, card)

        if self.uno_game.is_game_over():
            self.losses += 1
            return self.get_observations(), LOSS_REWARD, True, False, {}

        self.valid_moves += 1
        return self.get_observations(), VALID_MOVE_REWARD, False, False, {}`

Let me know how you would approach this. Maybe there is another way entirely?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[question] Trying to make a Uno Env #1199

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

[question] Trying to make a Uno Env #1199

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions