Recurrent SAC for the Blind Key Door Environment

This repository provides a minimal, fully runnable implementation of a recurrent Soft Actor-Critic (RNN-SAC) agent for a partially observable "Blind Key Door" environment. The observation available to the agent is restricted to [door_open, has_key, last_action], so the recurrent state must track the agent's belief about its location and progress.

Project Structure

blind_key_door_env.py – Lightweight environment that exposes the key-door challenge with discrete actions and minimal observations.
rnn_sac_agent.py – Implementation of the belief encoder (GRU), recurrent SAC policy, critics, and sequence replay buffer.
train.py – Training and evaluation loop that ties the agent and environment together.

Dependencies

Install the required packages before running the training script:

pip install torch gymnasium numpy

The code automatically uses a CUDA device if one is available; otherwise it falls back to CPU.

Training

To train the agent with default hyperparameters (15,000 episodes, sequence length of 20):

python train.py

During training the script periodically evaluates the policy in a deterministic version of the environment (no action slip). After training completes, the script renders one evaluation episode in the terminal.

To experiment with different settings (e.g., fewer episodes during development), import the helper function and override the defaults:

from train import train_rnn_sac

agent, train_returns, eval_returns = train_rnn_sac(episodes=2000, batch_size=32, seq_len=16)

Notes

Intrinsic rewards such as curiosity or boredom can be incorporated in train.py by augmenting the reward signal before it is stored in the replay buffer.
The environment uses a configurable slip_prob parameter to introduce stochasticity during training while keeping evaluation deterministic.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
README.md		README.md
blind_key_door_env.py		blind_key_door_env.py
rnn_sac_agent.py		rnn_sac_agent.py
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Recurrent SAC for the Blind Key Door Environment

Project Structure

Dependencies

Training

Notes

About

Uh oh!

Releases

Packages

Languages

stevenk42/RNN-SAC-Agent

Folders and files

Latest commit

History

Repository files navigation

Recurrent SAC for the Blind Key Door Environment

Project Structure

Dependencies

Training

Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages