SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies
Authors: Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Animashree Anandkumar
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that SECANT significantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: Deep Mind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%). |
| Researcher Affiliation | Collaboration | 1Stanford University, CA, USA. 2NVIDIA, CA, USA. 3The University of Texas at Austin, TX, USA. 4California Institute of Technology, CA, USA. Correspondence to: Linxi Fan <jimfan@cs.stanford.edu>. |
| Pseudocode | Yes | Algorithm 1 shows the full pseudocode. |
| Open Source Code | Yes | Code release and video are available at this link . |
| Open Datasets | Yes | We benchmark on Deepmind Control Suite (DMControl) with randomized color and video backgrounds (Hansen et al., 2020)... (1) Robosuite (Zhu et al., 2020):... (2) CARLA (Dosovitskiy et al., 2017):... (3) i Gibson (Shen et al., 2020): indoor object navigation in 20 distinct rooms with a large variety of interior design and layouts that we standardize. |
| Dataset Splits | No | The paper details training procedures and evaluation on test environments, but it does not explicitly describe a separate validation dataset split or its purpose within the experimental setup. |
| Hardware Specification | Yes | Averaged over 1000 inference steps, SECANT is 65 faster than PAD on Intel Xeon Gold 5220 (2.2 GHz) CPU, and 42 faster on Nvidia RTX 2080Ti GPU. |
| Software Dependencies | No | The paper mentions using Soft Actor-Critic (SAC) and various augmentation techniques (e.g., Mixup, Cutmix) but does not provide specific version numbers for any software, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | Algorithm details. SECANT builds upon SAC, and adopts similar hyperparameters and network architecture as Kostrikov et al. (2020). Observations are stacks of 3 consecutive RGB frames. For all tasks, we use a 4-layer feed-forward Conv Net with no residual connection as encoder for both the SECANT expert and student... All methods are trained for 500K steps with dense task-specific rewards... Following prior works (Hansen et al., 2020) on DMControl, we repeat training across 10 random seeds to report the mean and standard deviation of the rewards. We use 5 random seeds for all other simulators and ablation studies. |