Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Robust Policy Learning over Multiple Uncertainty Sets
Authors: Annie Xie, Shagun Sodhani, Chelsea Finn, Joelle Pineau, Amy Zhang
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design several experiments to understand the effectiveness of our proposed approach compared to system identification and robust RL approaches in unseen environments. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Facebook AI Research. |
| Pseudocode | Yes | Algorithm 1 System Identification and Risk-Sensitive Adaptation (SIRSA) |
| Open Source Code | Yes | Code and videos of our results are on our webpage: https: //sites.google.com/view/sirsa-public/home. |
| Open Datasets | Yes | Half-cheetah (Brockman et al., 2016); Peg insertion (Zhao et al., 2020; Schoettler et al., 2020). We design several environments to evaluate our approach, and in each, vary one or more parameters that affect the dynamics and/or reward function. |
| Dataset Splits | No | The paper describes its training process using replay buffers and test-time evaluation, but does not explicitly mention the use of a separate validation set or validation split for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Soft Actor-Critic (SAC) and REDQ, which are algorithms/frameworks, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | System identification model. We train an ensemble of B = 4 models, which are MLPs with 2 fully-connected layers of size 64 in the Point Mass domain; 2 fully-connected laters of size 256 in all other domains. Policy and critic networks. The policy and critic networks are MLPs with 2 fully-connected layers of size 64 in the Point Mass domain; 2 fully-connected layers of size 256 in all other domains. CVa R approximation. In our experiments, we use N = 50 CVa R samples to approximate the gradient of the CVa R. Training phases. In Point Mass, we optimize the SAC objectives for 25K iterations then optimize the CVa R for another 25K iterations, for a total of 50K training iterations. In the Minitaur and Peg Insertion domains, we pre-train for 150K iterations then optimize CVa R for 150K iterations for a total of 300K. In Half-Cheetah, the pre-training is 2.5M, and the the CVa R optimization is 0.5M long, for a total of 3M steps. |