Seeing Differently, Acting Similarly: Heterogeneously Observable Imitation Learning

Authors: Xin-Qiang Cai, Yao-Xiang Ding, Zixuan Chen, Yuan Jiang, Masashi Sugiyama, Zhi-Hua Zhou

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that IWRE can solve various HOIL tasks, including the challenging tasks of transforming the vision-based demonstrations to random access memory (RAM)-based policies in the Atari domain, even with limited visual observations.
Researcher Affiliation Academia 1National Key Laboratory for Novel Software Technology, Nanjing University 2The University of Tokyo 3State Key Laboratory for CAD&CG, Zhejiang University 4RIKEN Center for Advanced Intelligence Project
Pseudocode Yes The pseudo-code of our algorithm is provided in the appendix. (Algorithm 1 IWRE.Pretraining, Algorithm 2 IWRE.Training)
Open Source Code No The paper mentions using 'OpenAI baselines' (a third-party library) and provides a link to videos of results, but does not state that the authors' own implementation code for their methodology is open-source or publicly available.
Open Datasets Yes We choose three pixel-memory based games in Atari and five continuous control objects in Mu Jo Co on Open AI platform (Brockman et al., 2016). For pixel-memory Atari games, OE: 84 84 4 raw pixels; OL: 128-byte random access memories (RAM). For continuous control Mu Jo Co objects, OE: half of original observation features; OL: another half of original observation features. Besides, twenty expert trajectories were collected for each environment.
Dataset Splits No The paper specifies environment setup and data collection ('twenty expert trajectories were collected') but does not explicitly detail training, validation, or test dataset splits (e.g., percentages or sample counts for each split).
Hardware Specification Yes All experiments were conducted on server clusters with NVIDIA Tesla V100 GPUs.
Software Dependencies No The paper mentions algorithms like PPO and DDPG-based agents, and optimizers like Adam, but does not provide specific version numbers for software dependencies (e.g., 'PyTorch 1.x', 'TensorFlow 2.x', or specific libraries with version numbers beyond general frameworks like 'OpenAI baselines').
Experiment Setup Yes The learning steps were 10^7 for Mu Jo Co and 5 x 10^6 for Atari environments. In the pretraining, we sampled 20 trajectories from π1... The buffer size for TPIL and IWRE was set as 5000. Each time the buffer is full, the encoder and the rejection model will be updated for 4 epochs; also LBC will update π2 for 100 epochs with batch size 256... The rejection model and discriminator were updated using Adam with a decayed learning rate of 3 x 10^-4; the batch size was 256. The ratio of update frequency between the learner and discriminator was 3: 1. The target coverage c in Equation (17) was set as 0.8. λ in Equation (17) was 1.0.