reproducibilityindex.ai

Solving Compositional Reinforcement Learning Problems via Task Reduction

Authors: Yunfei Li, Yilin Wu, Huazhe Xu, Xiaolong Wang, Yi Wu

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiment results show that SIR can signiﬁcantly accelerate and improve learning on a variety of challenging sparse-reward continuous-control problems with compositional structures. Code and videos are available at https://sites.google.com/ view/sir-compositional. [...] We compare SIR with baselines without task reduction. Experiments are presented in 3 different environments simulated in Mu Jo Co (Todorov et al., 2012) engine: a robotic-hand pushing scenario (denoted by Push ), a robotic-gripper stacking scenario (denoted by Stack ), and a 2D particle-based maze scenario with a much larger environment space (denoted by Maze ).
Researcher Affiliation	Academia	Yunfei Li1, , Yilin Wu2, Huazhe Xu3, Xiaolong Wang4, Yi Wu1,2, 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 Shanghai Qi Zhi Institute, 3 UC Berkeley, 4 UCSD
Pseudocode	Yes	Algorithm 1: Self-Imitation via Reduction
Open Source Code	Yes	Code and videos are available at https://sites.google.com/ view/sir-compositional.
Open Datasets	No	The paper uses custom environments simulated in MuJoCo (Push, Stack, Maze scenarios) where data is generated during training through interactions with the environment rather than using a pre-existing publicly available dataset with a direct link or citation.
Dataset Splits	No	The paper describes different task distributions and sampling strategies for training and evaluation (e.g., "mixture of easy and hard cases", "uniform sampler", "training curriculum") but does not provide specific dataset split percentages or counts for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions the use of MuJoCo for simulation and various algorithms like SAC, PPO, and β-VAE, but it does not specify any software dependencies with version numbers (e.g., MuJoCo version, Python version, specific library versions like PyTorch or TensorFlow).
Experiment Setup	Yes	We summarize hyperparameters of off-policy and on-policy algorithms in each scenario in Table 3 and Table 4. Table 3 lists: #workers, replay buffer size, batch size, γ, learning rate, σ, total timesteps. Table 4 lists: #minibatches, γ, #opt epochs, learning rate, #workers, #steps per iter, demo reuse, total timesteps.