Self-Supervised Policy Adaptation during Deployment

Authors: Nicklas Hansen, Rishabh Jangir, Yu Sun, Guillem Alenyà, Pieter Abbeel, Alexei A Efros, Lerrel Pinto, Xiaolong Wang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluations are performed on diverse simulation environments from Deep Mind Control suite and Vi ZDoom, as well as real robotic manipulation tasks in continuously changing environments, taking observations from an uncalibrated camera. Our method improves generalization in 31 out of 36 environments across various tasks and outperforms domain randomization on a majority of environments.
Researcher Affiliation Academia Nicklas Hansen12, Rishabh Jangir13, Yu Sun4, Guillem Aleny a3, Pieter Abbeel4, Alexei A Efros4, Lerrel Pinto5, Xiaolong Wang1 1UC San Diego 2Technical University of Denmark 3IRI, CSIC-UPC 4UC Berkeley 5NYU
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks. The training and testing procedures are described mathematically using equations, but not in a pseudocode format.
Open Source Code Yes Webpage and implementation: https://nicklashansen.github.io/PAD/
Open Datasets Yes In simulation, we evaluate our method (PAD) and baselines extensively on continuous control tasks from Deep Mind Control (DMControl) suite (Tassa et al., 2018) as well as the CRLMaze (Lomonaco et al., 2019) navigation task...
Dataset Splits No The paper describes training and testing environments, but does not explicitly mention the use of a separate validation dataset split with specified percentages, sample counts, or methodology.
Hardware Specification No The paper mentions using a 'Kinova Gen3 robot' for real-world tasks but does not provide details on the specific computational hardware (e.g., GPU models, CPU types, memory, or cloud instances) used for training models or running simulations.
Software Dependencies No The paper mentions using Soft Actor-Critic (SAC) and Advantage Actor-Critic (A2C) algorithms and the Adam optimizer, but it does not specify version numbers for any programming languages, libraries, or frameworks (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup Yes Network details. For DMControl and the robotic manipulation tasks we implement PAD on top of Soft Actor-Critic (SAC) (Haarnoja et al., 2018), and adopt both network architecture and hyperparameters from Yarats et al. (2019), with minor modifications... See appendix F for implementation details.