Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Self-supervised Color Generalization in Reinforcement Learning
Authors: Matthias Weissenbacher, Evangelos Routis, Yoshinobu Kawahara
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate our method in the Minigrid, Procgen, and Deep Mind Control suites and find improved color sensitivity and generalisation. |
| Researcher Affiliation | Collaboration | Matthias Weissenbacher EMAIL Riken Center for Advanced Intelligence Project Pyr-SAI Labs Japan Evangelos Routis EMAIL Causaly London United Kingdom Yoshinobu Kawahara Riken Center for Advanced Intelligence Project Osaka University Japan |
| Pseudocode | No | The paper describes algorithms like rDMD and CiL mathematically and in narrative text, but does not present them in a structured pseudocode or algorithm block. |
| Open Source Code | Yes | In section 4.2 we perform our main experiments on the Procgen environment. The code is made public at Git Hub. |
| Open Datasets | Yes | We empirically evaluate our method in the Minigrid, Procgen, and Deep Mind Control suites... The Lava Crossing environment, a standard in the Mini Grid toolkit (Chevalier-Boisvert et al., 2019)... The Procgen benchmark consists of sixteen procedurally generated games... Procgen generalization benchmark (Cobbe et al., 2020)... Deepmind Control suite (DMControl) (Tassa et al., 2018). |
| Dataset Splits | Yes | Following the setup from (Cobbe et al., 2020), agents are trained on a fixed set of n = 200 levels (generated using seeds from 1 to 200) and tested on the full distribution of levels (generated by sampling seeds uniformly at random from all computer integers). |
| Hardware Specification | Yes | All experiments were performed on NVIDIA GPU A-100 or V-100. |
| Software Dependencies | No | The paper mentions 'torch.svd' and algorithms 'PPO/Dr AC' and 'SAC' but does not provide specific version numbers for these or other software libraries. |
| Experiment Setup | Yes | We summarize the hyperparameter choices in Table (5). Table 5: Architecture and hyper-parameter choices for Ci L on Procgen, DMControl, Minigrid based on (Raileanu et al., 2020), (Hansen & Wang, 2021), and (Jiang et al., 2021), respectively. Channels refer to the category channels. We use the algorithms in the code-base without any hyper-parameter changes except for reduction of hidden-dim of the actor-critic networks to 64. The patch size follows the convention in Vision Transformers; for a 64x64 pixel input, we use 8x8 patches. |