Visual Imitation Learning with Patch Rewards
Authors: Minghuan Liu, Tairan He, Weinan Zhang, Shuicheng YAN, Zhongwen Xu
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on Deep Mind Control Suite and Atari tasks. The experiment results have demonstrated that Patch AIL outperforms baseline methods and provides valuable interpretations for visual demonstrations. |
| Researcher Affiliation | Collaboration | 1Shanghai Jiaotong University, 2Sea AI Lab {minghuanliu, whynot, wnzhang}@sjtu.edu.cn, {yansc,xuzw}@sea.com |
| Pseudocode | Yes | A ALGORITHM OUTLINE Algorithm 1 Adversarial Imitation Learning with Patch rewards (Patch AIL) |
| Open Source Code | Yes | Our codes are available at https://github.com/sail-sg/Patch AIL. |
| Open Datasets | Yes | We evaluate our method on Deep Mind Control Suite and Atari tasks. For expert demonstrations, we use 10 trajectories collected by a Dr Qv2 (Yarats et al., 2021) agent trained using the ground-truth reward and collect. We use the data collected by RLUnplugged Gulcehre et al. (2020) as expert demonstrations. |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly mention distinct training/test/validation dataset splits with percentages or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with specific versions like PyTorch 1.9 or CUDA 11.1). |
| Experiment Setup | Yes | C.2 IMPORTANT HYPERPARAMETERS We list the key hyperparameters of AIL methods used in our experiment in Tab. 2. The hyperparameters are the same for all domains and tasks evaluated in this paper. Method Parameter Value Common Default replay buffer size 150000 Learning rate 1e-4 Discount γ 0.99 Frame stack / n-step returns 3 Action repeat 2 Mini-batch size 256 Agent update frequency 2 Critic soft-update rate 0.01 Feature dim 50 Hidden dim 1024 Optimizer Adam Exploration steps 2000 DDPG exploration schedule linear(1,0.1,500000) Gradient penalty coefficient 10 Target feature processor update frequency(steps) 20000 Default Reward scale 1.0 Default data augmentation strategy random shift Patch AIL-W λ initial value 1.3 Replay buffer size 1000000 Patch AIL-B λ initial value 0.5 Replay buffer size 1000000 Reward scale 0.5 |