reproducibilityindex.ai

SECANT: Self-Expert Cloning for Zero-Shot Generalization of Visual Policies

Authors: Linxi Fan, Guanzhi Wang, De-An Huang, Zhiding Yu, Li Fei-Fei, Yuke Zhu, Animashree Anandkumar

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that SECANT signiﬁcantly advances the state of the art in zero-shot generalization across 4 challenging domains. Our average reward improvements over prior SOTAs are: Deep Mind Control (+26.5%), robotic manipulation (+337.8%), vision-based autonomous driving (+47.7%), and indoor object navigation (+15.8%).
Researcher Affiliation	Collaboration	1Stanford University, CA, USA. 2NVIDIA, CA, USA. 3The University of Texas at Austin, TX, USA. 4California Institute of Technology, CA, USA. Correspondence to: Linxi Fan <jimfan@cs.stanford.edu>.
Pseudocode	Yes	Algorithm 1 shows the full pseudocode.
Open Source Code	Yes	Code release and video are available at this link .
Open Datasets	Yes	We benchmark on Deepmind Control Suite (DMControl) with randomized color and video backgrounds (Hansen et al., 2020)... (1) Robosuite (Zhu et al., 2020):... (2) CARLA (Dosovitskiy et al., 2017):... (3) i Gibson (Shen et al., 2020): indoor object navigation in 20 distinct rooms with a large variety of interior design and layouts that we standardize.
Dataset Splits	No	The paper details training procedures and evaluation on test environments, but it does not explicitly describe a separate validation dataset split or its purpose within the experimental setup.
Hardware Specification	Yes	Averaged over 1000 inference steps, SECANT is 65 faster than PAD on Intel Xeon Gold 5220 (2.2 GHz) CPU, and 42 faster on Nvidia RTX 2080Ti GPU.
Software Dependencies	No	The paper mentions using Soft Actor-Critic (SAC) and various augmentation techniques (e.g., Mixup, Cutmix) but does not provide specific version numbers for any software, libraries, or frameworks used in the experiments.
Experiment Setup	Yes	Algorithm details. SECANT builds upon SAC, and adopts similar hyperparameters and network architecture as Kostrikov et al. (2020). Observations are stacks of 3 consecutive RGB frames. For all tasks, we use a 4-layer feed-forward Conv Net with no residual connection as encoder for both the SECANT expert and student... All methods are trained for 500K steps with dense task-speciﬁc rewards... Following prior works (Hansen et al., 2020) on DMControl, we repeat training across 10 random seeds to report the mean and standard deviation of the rewards. We use 5 random seeds for all other simulators and ablation studies.