reproducibilityindex.ai

Robust Policy Learning over Multiple Uncertainty Sets

Authors: Annie Xie, Shagun Sodhani, Chelsea Finn, Joelle Pineau, Amy Zhang

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We design several experiments to understand the effectiveness of our proposed approach compared to system identiﬁcation and robust RL approaches in unseen environments.
Researcher Affiliation	Collaboration	1Stanford University 2Facebook AI Research.
Pseudocode	Yes	Algorithm 1 System Identiﬁcation and Risk-Sensitive Adaptation (SIRSA)
Open Source Code	Yes	Code and videos of our results are on our webpage: https: //sites.google.com/view/sirsa-public/home.
Open Datasets	Yes	Half-cheetah (Brockman et al., 2016); Peg insertion (Zhao et al., 2020; Schoettler et al., 2020). We design several environments to evaluate our approach, and in each, vary one or more parameters that affect the dynamics and/or reward function.
Dataset Splits	No	The paper describes its training process using replay buffers and test-time evaluation, but does not explicitly mention the use of a separate validation set or validation split for hyperparameter tuning or early stopping.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using Soft Actor-Critic (SAC) and REDQ, which are algorithms/frameworks, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	System identiﬁcation model. We train an ensemble of B = 4 models, which are MLPs with 2 fully-connected layers of size 64 in the Point Mass domain; 2 fully-connected laters of size 256 in all other domains. Policy and critic networks. The policy and critic networks are MLPs with 2 fully-connected layers of size 64 in the Point Mass domain; 2 fully-connected layers of size 256 in all other domains. CVa R approximation. In our experiments, we use N = 50 CVa R samples to approximate the gradient of the CVa R. Training phases. In Point Mass, we optimize the SAC objectives for 25K iterations then optimize the CVa R for another 25K iterations, for a total of 50K training iterations. In the Minitaur and Peg Insertion domains, we pre-train for 150K iterations then optimize CVa R for 150K iterations for a total of 300K. In Half-Cheetah, the pre-training is 2.5M, and the the CVa R optimization is 0.5M long, for a total of 3M steps.