Skip to content

Clean and easy to understand implementations of many Quantum Reinforcement Learning agents as well as their classical analouges. Greately inspired by the orgininal CleanRL

License

Notifications You must be signed in to change notification settings

FhG-IISB-MKI/cleanqrl

Repository files navigation

CleanQRL (Clean Quantum Reinforcement Learning)

docs Code style: black Imports: isort

CleanQRL is a Reinforcement Learning library specifically tailored to the subbranch of Quantum Reinforcement Learning and is greatly inspired by the amazing work of CleanRL. Just as the classical analogue, we aim to provide high-quality single-file implementation with research-friendly features. The implementation follows mainly the ideas of CleanRL and is clean and simple, yet can scale nicely trough additional features such as ray tune. The main features of this repository are

  • 📜 Single-file implementations of classical and quantum version of 4+ Reinforcement Learning agents
  • 💾 Tuned and Benchmarked agents (with available configs)
  • 🎮 Integration of gymnasium, mujoco and jumanji
  • 📘 Examples on how to enhance the standard QRL agents on a variety of games
  • 📈 Tensorboard Logging
  • 🌱 Local Reproducibility via Seeding
  • 🧫 Experiment Management with Weights and Biases
  • 📊 Easy and straight forward hyperparameter tuning with ray tune

What we are missing compared to CleanRL:

  • 💸 Cloud Integration with docker and AWS
  • 📹 Videos of Gameplay Capturing

You can read more about CleanQRL in our upcoming paper.

Get started

Installation

To run experiments locally, you need to clone the repository and install a python environment.

git clone https://github.com/fhg-iisb-mki/cleanqrl.git
cd cleanqrl
conda env create -f environment.yaml

Thats it, now you're set up!

Run first experiments

Each agent can be run as a single file, either from the parent directory or directly in the subfolder. First, activate the environment cleanqrl and then execute the algorithm's python file:

conda activate cleanqrl
python cleanrl/reinforce_quantum.py 

or go directly into the folder and execute

conda activate cleanqrl
cd cleanqrl 
python reinforce_quantum.py 

Before you execute the files, customize the parameters in the Config class at the end of each file. Every file has such a dataclass object and the algorithm is callable as a function which takes the config as input:

def reinforce_quantum(config):
    num_envs = config["num_envs"]
    total_timesteps = config["total_timesteps"]
    env_id = config["env_id"]
    gamma = config["gamma"]
    lr_input_scaling = config["lr_input_scaling"]
    lr_weights = config["lr_weights"]
    lr_output_scaling = config["lr_output_scaling"]
    .... 

This function can also be called from an external file (see below for details). But first, lets have a closer look to the the Config:

@dataclass
class Config:
    # General parameters
    trial_name: str = 'reinforce_quantum'  # Name of the trial
    trial_path: str = 'logs'  # Path to save logs relative to the parent directory
    wandb: bool = False # Use wandb to log experiment data 
    project_name: str = "cleanqrl"  # If wandb is used, name of the wandb-project

    # Environment parameters
    env_id: str = "CartPole-v1" # Environment ID
    
    # Algorithm parameters
    num_envs: int = 1  # Number of environments
    total_timesteps: int = 100000  # Total number of timesteps
    gamma: float = 0.99  # discount factor
    lr_input_scaling: float = 0.01  # Learning rate for input scaling
    lr_weights: float = 0.01  # Learning rate for variational parameters
    lr_output_scaling: float = 0.01  # Learning rate for output scaling
    cuda: bool = False  # Whether to use CUDA
    num_qubits: int = 4  # Number of qubits
    num_layers: int = 2  # Number of layers in the quantum circuit
    device: str = "default.qubit"  # Quantum device
    diff_method: str = "backprop"  # Differentiation method
    save_model: bool = True # Save the model after the run

As you can see, the config is divided into 3 parts:

  • General parameters: Here the name of your experiment as well as the logging path is defined. All metrics will be logged in a result.json file in the result folder which will have the time of the experiment execution as a prefix. You can also use wandb for enhanced metric logging.
  • Environment parameters: This is in the simplest case just the string of the gym environment. For jumanji environments as well as for your custom environments, you can also specify additional parameters here (see #Tutorials for details).
  • Algorithms parameters: All algorithms hyperparameters are specified here. For details on the parameters see the algorithms section

Once you execute the file, it will create the subfolders and copy the config which is used for the experiment in the folder:

    config = vars(Config())
    
    # Based on the current time, create a unique name for the experiment
    config['trial_name'] = datetime.now().strftime("%Y-%m-%d--%H-%M-%S") + '_' + config['trial_name']
    config['path'] = os.path.join(Path(__file__).parent.parent, config['trial_path'], config['trial_name'])

    # Create the directory and save a copy of the config file so that the experiment can be replicated
    os.makedirs(os.path.dirname(config['path'] + '/'), exist_ok=True)
    config_path = os.path.join(config['path'], 'config.yml')
    with open(config_path, 'w') as file:
        yaml.dump(config, file)

    # Start the agent training 
    reinforce_quantum(config)   

After the execution, the experiment data is saved e.g. at:

...
configs
examples
logs/
    2025-03-04--14-59-32_reinforce_quantum          # The name of your experiment
        config.yaml                                 # Config which was used to run this experiment
        result.json                                 # Results of the experiment
.gitignore
...

You can also set the wandb variable to True:

@dataclass
class Config:
    # General parameters
    trial_name: str = 'reinforce_quantum_wandb'  # Name of the trial
    trial_path: str = 'logs'  # Path to save logs relative to the parent directory
    wandb: bool = True # Use wandb to log experiment data 
    project_name: str = "cleanqrl"  # If wandb is used, name of the wandb-project

    # Environment parameters
    env_id: str = "CartPole-v1" # Environment ID

You will need to login to your wandb account before you can run:

wandb login # only required for the first time
python cleanrl/reinforce_quantum.py \

This will create an additional folder for the wandb logging and you can inspect your experiment data also online:

...
configs
examples
logs/
    2025-03-04--14-59-32_reinforce_quantum_wandb    # The name of your experiment
        wandb                                       # Subfolder of the wandb logging
        config.yaml                                 # Config which was used to run this experiment
        result.json                                 # Results of the experiment
.gitignore
...

Algorithms

Contact and Community

We want to grow as a community, so posting Github Issues and PRs are very welcome! If you are missing and algorithms or have a specific problem to which you want to tailor your QRL algorithms but fail to do so, you can also create a feature request!

Citing CleanQRL

If you use CleanQRL in your work, please cite our [paper]:

Citing CleanRL

If you used mainly the classical parts of our code in your work, please cite the original CleanRL paper:

@article{huang2022cleanrl,
  author  = {Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and João G.M. Araújo},
  title   = {CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {274},
  pages   = {1--18},
  url     = {http://jmlr.org/papers/v23/21-1342.html}
}

About

Clean and easy to understand implementations of many Quantum Reinforcement Learning agents as well as their classical analouges. Greately inspired by the orgininal CleanRL

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •  

Languages