SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Authors: Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Animashree Anandkumar

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: Deep Mind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%).
Researcher Affiliation Collaboration 1Stanford University, CA, USA. 2NVIDIA, CA, USA. 3The University of Texas at Austin, TX, USA. 4California Institute of Technology, CA, USA. Correspondence to: Linxi Fan <jimfan@cs.stanford.edu>.
Pseudocode Yes Algorithm 1 shows the full pseudocode.
Open Source Code Yes Code release and video are available at this link .
Open Datasets Yes We benchmark on Deepmind Control Suite (DMControl) with randomized color and video backgrounds (Hansen et al., 2020)... (1) Robosuite (Zhu et al., 2020):... (2) CARLA (Dosovitskiy et al., 2017):... (3) i Gibson (Shen et al., 2020): indoor object navigation in 20 distinct rooms with a large variety of interior design and layouts that we standardize.
Dataset Splits No The paper details training procedures and evaluation on test environments, but it does not explicitly describe a separate validation dataset split or its purpose within the experimental setup.
Hardware Specification Yes Averaged over 1000 inference steps, SECANT is 65 faster than PAD on Intel Xeon Gold 5220 (2.2 GHz) CPU, and 42 faster on Nvidia RTX 2080Ti GPU.
Software Dependencies No The paper mentions using Soft Actor-Critic (SAC) and various augmentation techniques (e.g., Mixup, Cutmix) but does not provide specific version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup Yes Algorithm details. SECANT builds upon SAC, and adopts similar hyperparameters and network architecture as Kostrikov et al. (2020). Observations are stacks of 3 consecutive RGB frames. For all tasks, we use a 4-layer feed-forward Conv Net with no residual connection as encoder for both the SECANT expert and student... All methods are trained for 500K steps with dense task-specific rewards... Following prior works (Hansen et al., 2020) on DMControl, we repeat training across 10 random seeds to report the mean and standard deviation of the rewards. We use 5 random seeds for all other simulators and ablation studies.