Robust Policy Learning over Multiple Uncertainty Sets
Authors: Annie Xie, Shagun Sodhani, Chelsea Finn, Joelle Pineau, Amy Zhang
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We design several experiments to understand the effectiveness of our proposed approach compared to system identification and robust RL approaches in unseen environments. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Facebook AI Research. |
| Pseudocode | Yes | Algorithm 1 System Identification and Risk-Sensitive Adaptation (SIRSA) |
| Open Source Code | Yes | Code and videos of our results are on our webpage: https: //sites.google.com/view/sirsa-public/home. |
| Open Datasets | Yes | Half-cheetah (Brockman et al., 2016); Peg insertion (Zhao et al., 2020; Schoettler et al., 2020). We design several environments to evaluate our approach, and in each, vary one or more parameters that affect the dynamics and/or reward function. |
| Dataset Splits | No | The paper describes its training process using replay buffers and test-time evaluation, but does not explicitly mention the use of a separate validation set or validation split for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Soft Actor-Critic (SAC) and REDQ, which are algorithms/frameworks, but does not specify software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | System identification model. We train an ensemble of B = 4 models, which are MLPs with 2 fully-connected layers of size 64 in the Point Mass domain; 2 fully-connected laters of size 256 in all other domains. Policy and critic networks. The policy and critic networks are MLPs with 2 fully-connected layers of size 64 in the Point Mass domain; 2 fully-connected layers of size 256 in all other domains. CVa R approximation. In our experiments, we use N = 50 CVa R samples to approximate the gradient of the CVa R. Training phases. In Point Mass, we optimize the SAC objectives for 25K iterations then optimize the CVa R for another 25K iterations, for a total of 50K training iterations. In the Minitaur and Peg Insertion domains, we pre-train for 150K iterations then optimize CVa R for 150K iterations for a total of 300K. In Half-Cheetah, the pre-training is 2.5M, and the the CVa R optimization is 0.5M long, for a total of 3M steps. |