Phasic Self-Imitative Reduction for Sparse-Reward Goal-Conditioned Reinforcement Learning

Authors: Yunfei Li, Tian Gao, Jiaqi Yang, Huazhe Xu, Yi Wu

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a variety of goal-conditioned control problems, including relatively simple benchmarks such as pushing and ant-maze navigation, and a challenging sparse-reward cube-stacking task.
Researcher Affiliation Academia 1Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China 2Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, CA, USA 3Stanford University, CA, USA 4Shanghai Qi Zhi Institute, Shanghai, China.
Pseudocode Yes Algorithm 1 Phasic Self-Imitative Reduction
Open Source Code Yes The project webpage is at https://sites.google.com/view/pair-gcrl.
Open Datasets Yes Push is a robotic pushing environment adopted from (Nair et al., 2018b) simulated with Mu Jo Co (Todorov et al., 2012) engine.
Dataset Splits No The paper describes a phasic training approach with online RL and offline SL, and mentions collecting data for training. However, it does not provide specific percentages or counts for train/validation/test dataset splits, nor does it refer to predefined splits for these purposes.
Hardware Specification Yes All the experiments are repeated over 3 random seeds on a single desktop machine with a GTX3090 GPU.
Software Dependencies No The paper mentions software like Mu Jo Co, Py Bullet, and algorithms like PPO and Adam optimizer with citations to their original papers, but does not provide explicit version numbers for these software packages or other common libraries (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes All the hyper-parameters are listed in Table 2.