Solving Compositional Reinforcement Learning Problems via Task Reduction

Authors: Yunfei Li, Yilin Wu, Huazhe Xu, Xiaolong Wang, Yi Wu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiment results show that SIR can significantly accelerate and improve learning on a variety of challenging sparse-reward continuous-control problems with compositional structures. Code and videos are available at https://sites.google.com/ view/sir-compositional. [...] We compare SIR with baselines without task reduction. Experiments are presented in 3 different environments simulated in Mu Jo Co (Todorov et al., 2012) engine: a robotic-hand pushing scenario (denoted by Push ), a robotic-gripper stacking scenario (denoted by Stack ), and a 2D particle-based maze scenario with a much larger environment space (denoted by Maze ).
Researcher Affiliation Academia Yunfei Li1, , Yilin Wu2, Huazhe Xu3, Xiaolong Wang4, Yi Wu1,2, 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 Shanghai Qi Zhi Institute, 3 UC Berkeley, 4 UCSD
Pseudocode Yes Algorithm 1: Self-Imitation via Reduction
Open Source Code Yes Code and videos are available at https://sites.google.com/ view/sir-compositional.
Open Datasets No The paper uses custom environments simulated in MuJoCo (Push, Stack, Maze scenarios) where data is generated during training through interactions with the environment rather than using a pre-existing publicly available dataset with a direct link or citation.
Dataset Splits No The paper describes different task distributions and sampling strategies for training and evaluation (e.g., "mixture of easy and hard cases", "uniform sampler", "training curriculum") but does not provide specific dataset split percentages or counts for training, validation, and testing.
Hardware Specification No The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions the use of MuJoCo for simulation and various algorithms like SAC, PPO, and β-VAE, but it does not specify any software dependencies with version numbers (e.g., MuJoCo version, Python version, specific library versions like PyTorch or TensorFlow).
Experiment Setup Yes We summarize hyperparameters of off-policy and on-policy algorithms in each scenario in Table 3 and Table 4. Table 3 lists: #workers, replay buffer size, batch size, γ, learning rate, σ, total timesteps. Table 4 lists: #minibatches, γ, #opt epochs, learning rate, #workers, #steps per iter, demo reuse, total timesteps.