Self-Supervised Policy Adaptation during Deployment
Authors: Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A Efros, Lerrel Pinto, Xiaolong Wang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluations are performed on diverse simulation environments from Deep Mind Control suite and Vi ZDoom, as well as real robotic manipulation tasks in continuously changing environments, taking observations from an uncalibrated camera. Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments. |
| Researcher Affiliation | Academia | Nicklas Hansen12, Rishabh Jangir13, Yu Sun4, Guillem Aleny a3, Pieter Abbeel4, Alexei A Efros4, Lerrel Pinto5, Xiaolong Wang1 1UC San Diego 2Technical University of Denmark 3IRI, CSIC-UPC 4UC Berkeley 5NYU |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. The training and testing procedures are described mathematically using equations, but not in a pseudocode format. |
| Open Source Code | Yes | Webpage and implementation: https://nicklashansen.github.io/PAD/ |
| Open Datasets | Yes | In simulation, we evaluate our method (PAD) and baselines extensively on continuous control tasks from Deep Mind Control (DMControl) suite (Tassa et al., 2018) as well as the CRLMaze (Lomonaco et al., 2019) navigation task... |
| Dataset Splits | No | The paper describes training and testing environments, but does not explicitly mention the use of a separate validation dataset split with specified percentages, sample counts, or methodology. |
| Hardware Specification | No | The paper mentions using a 'Kinova Gen3 robot' for real-world tasks but does not provide details on the specific computational hardware (e.g., GPU models, CPU types, memory, or cloud instances) used for training models or running simulations. |
| Software Dependencies | No | The paper mentions using Soft Actor-Critic (SAC) and Advantage Actor-Critic (A2C) algorithms and the Adam optimizer, but it does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA). |
| Experiment Setup | Yes | Network details. For DMControl and the robotic manipulation tasks we implement PAD on top of Soft Actor-Critic (SAC) (Haarnoja et al., 2018), and adopt both network architecture and hyperparameters from Yarats et al. (2019), with minor modifications... See appendix F for implementation details. |