Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
Authors: Sai Rajeswar, Pietro Mazzaglia, Tim Verbelen, Alexandre Piché, Bart Dhoedt, Aaron Courville, Alexandre Lacoste
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models.An extensive empirical evaluation, supported by more than 2k experiments, among main results, analysis and ablations, was used to carefully study URLB and analyse our method. |
| Researcher Affiliation | Collaboration | *Equal contribution 1Mila, Universit e de Montr eal 2Service Now Research 3Ghent University imec, Belgium 4CIFAR Fellow. |
| Pseudocode | Yes | Algorithm 1 Dyna-MPC Algorithm 2 Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels |
| Open Source Code | Yes | Project website: https://masteringurlb.github.io/ Details on the implementation are provided in Appendix B and the code is available on the project website. |
| Open Datasets | Yes | Recently, the Unsupervised RL Benchmark (URLB) (Laskin et al., 2021) established a common protocol to compare self-supervised algorithms across several domains and tasks from the DMC Suite (Tassa et al., 2018). |
| Dataset Splits | No | The paper describes the pre-training (PT) phase for up to "2M frames" and a fine-tuning (FT) phase for "100k frames" as interaction budgets with the environment. However, it does not provide explicit training, validation, or test dataset splits with percentages or sample counts for static datasets, which are typically found in supervised learning contexts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper refers to algorithms and optimizers (e.g., "Dreamer V2", "Adam") and provides their hyperparameters, but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x) that would be needed for replication. |
| Experiment Setup | Yes | The hyperparameters for the agent, which we keep fixed across all domains and tasks, can be found in Appendix I. Table 5. World model, actor-critic, planner (Dyna-MPC) and common hyperparameters. |