Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels

Authors: Sai Rajeswar, Pietro Mazzaglia, Tim Verbelen, Alexandre Piché, Bart Dhoedt, Aaron Courville, Alexandre Lacoste

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The approach is empirically evaluated through a large-scale empirical study, which we use to validate our design choices and analyze our models.An extensive empirical evaluation, supported by more than 2k experiments, among main results, analysis and ablations, was used to carefully study URLB and analyse our method.
Researcher Affiliation Collaboration *Equal contribution 1Mila, Universit e de Montr eal 2Service Now Research 3Ghent University imec, Belgium 4CIFAR Fellow.
Pseudocode Yes Algorithm 1 Dyna-MPC Algorithm 2 Mastering the Unsupervised Reinforcement Learning Benchmark from Pixels
Open Source Code Yes Project website: https://masteringurlb.github.io/ Details on the implementation are provided in Appendix B and the code is available on the project website.
Open Datasets Yes Recently, the Unsupervised RL Benchmark (URLB) (Laskin et al., 2021) established a common protocol to compare self-supervised algorithms across several domains and tasks from the DMC Suite (Tassa et al., 2018).
Dataset Splits No The paper describes the pre-training (PT) phase for up to "2M frames" and a fine-tuning (FT) phase for "100k frames" as interaction budgets with the environment. However, it does not provide explicit training, validation, or test dataset splits with percentages or sample counts for static datasets, which are typically found in supervised learning contexts.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper refers to algorithms and optimizers (e.g., "Dreamer V2", "Adam") and provides their hyperparameters, but it does not specify software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x) that would be needed for replication.
Experiment Setup Yes The hyperparameters for the agent, which we keep fixed across all domains and tasks, can be found in Appendix I. Table 5. World model, actor-critic, planner (Dyna-MPC) and common hyperparameters.