reproducibilityindex.ai

Visual Imitation Learning with Patch Rewards

Authors: Minghuan Liu, Tairan He, Weinan Zhang, Shuicheng YAN, Zhongwen Xu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method on Deep Mind Control Suite and Atari tasks. The experiment results have demonstrated that Patch AIL outperforms baseline methods and provides valuable interpretations for visual demonstrations.
Researcher Affiliation	Collaboration	1Shanghai Jiaotong University, 2Sea AI Lab {minghuanliu, whynot, wnzhang}@sjtu.edu.cn, {yansc,xuzw}@sea.com
Pseudocode	Yes	A ALGORITHM OUTLINE Algorithm 1 Adversarial Imitation Learning with Patch rewards (Patch AIL)
Open Source Code	Yes	Our codes are available at https://github.com/sail-sg/Patch AIL.
Open Datasets	Yes	We evaluate our method on Deep Mind Control Suite and Atari tasks. For expert demonstrations, we use 10 trajectories collected by a Dr Qv2 (Yarats et al., 2021) agent trained using the ground-truth reward and collect. We use the data collected by RLUnplugged Gulcehre et al. (2020) as expert demonstrations.
Dataset Splits	No	The paper describes training and testing procedures but does not explicitly mention distinct training/test/validation dataset splits with percentages or sample counts.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with specific versions like PyTorch 1.9 or CUDA 11.1).
Experiment Setup	Yes	C.2 IMPORTANT HYPERPARAMETERS We list the key hyperparameters of AIL methods used in our experiment in Tab. 2. The hyperparameters are the same for all domains and tasks evaluated in this paper. Method Parameter Value Common Default replay buffer size 150000 Learning rate 1e-4 Discount γ 0.99 Frame stack / n-step returns 3 Action repeat 2 Mini-batch size 256 Agent update frequency 2 Critic soft-update rate 0.01 Feature dim 50 Hidden dim 1024 Optimizer Adam Exploration steps 2000 DDPG exploration schedule linear(1,0.1,500000) Gradient penalty coefficient 10 Target feature processor update frequency(steps) 20000 Default Reward scale 1.0 Default data augmentation strategy random shift Patch AIL-W λ initial value 1.3 Replay buffer size 1000000 Patch AIL-B λ initial value 0.5 Replay buffer size 1000000 Reward scale 0.5