Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Zero Shot Generalization of Vision-Based RL Without Data Augmentation
Authors: Sumeet Batra, Gaurav S. Sukhatme
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training. Specifically, we have two evaluation environments: color hard, which randomizes the color of the agent and background to extreme RGB values, and distracting cs, which applies camera shaking and plays a random video in the background from the DAVIS 2017 dataset (Pont-Tuset ets al., 2017). |
| Researcher Affiliation | Academia | 1Department of Computer Science, University of Southern California, Los Angeles, USA. Correspondence to: Sumeet Batra <EMAIL>, Gaurav Sukhatme <EMAIL>. |
| Pseudocode | Yes | A.8. ALDA Pseudocode. Algorithm 1 ALDA Forward Pass. Algorithm 2 Associative Latent Dynamics. |
| Open Source Code | No | The paper mentions "Our SAC implementation is based on (Yarats & Kostrikov, 2020)." which refers to a third-party implementation, but does not provide specific access information or an explicit statement about releasing the code for the methodology described in this paper. |
| Open Datasets | Yes | We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training... We do not expect to outperform SVEA since it uses additional data sampled from a dataset of 1.8 million diverse real-world scenes, likely putting the DMCGB evaluation tasks in-distribution... images sampled from the Places (Zhou et al., 2017) dataset. |
| Dataset Splits | Yes | We train on four challenging tasks from the Deep Mind Control Suite (Tassa et al., 2018). To evaluate zero-shot generalization capability, we periodically evaluate model performance under challenging distribution shifts from the DMControl Generalization Benchmark (Hansen & Wang, 2021) and the Distracting Control Suite (Stone et al., 2021) throughout training. Specifically, we have two evaluation environments: color hard, which randomizes the color of the agent and background to extreme RGB values, and distracting cs, which applies camera shaking and plays a random video in the background from the DAVIS 2017 dataset (Pont-Tuset et al., 2017). |
| Hardware Specification | No | The paper does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper states: "Our SAC implementation is based on (Yarats & Kostrikov, 2020)." This reference is to a PyTorch implementation but does not specify the version of PyTorch or any other software libraries used with their version numbers. |
| Experiment Setup | Yes | A.7.3. HYPERPAREMETERS. We list a set of common hyperparameters that are used in all domains. Table 1. Common hyperparameters for SAC and ALDA. Parameter Value Replay buffer capacity 1e6 Batch size 128 Latent model temperature β 100 Number of latents |zd| 12 Number of values per latent Vj 12 Encoder weight decay λθ 0.1 Decoder weight decay λϕ 0.1 Frame stack 3 Action repeat 2 for finger spin otherwise 4 Episode length 100 Observation space (9 x 64 x 64) Optimizer Adam Actor/Critic learning rate 1e-3 Encoder/Decoder learning rate 1e-3 Latent model learning rate 1e-3 Temperature learning rate 1e-4 Actor update frequency 2 Critic update frequency 2 Discount γ 0.99 |