reproducibilityindex.ai

Robust Subtask Learning for Compositional Generalization

Authors: Kishor Jothimurugan, Steve Hsu, Osbert Bastani, Rajeev Alur

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on two multi-task environments with continuous states and actions and demonstrate that our algorithms outperform state-of-the-art baselines.
Researcher Affiliation	Academia	1University of Pennsylvania. Correspondence to: Kishor Jothimurugan <kishor@seas.upenn.edu>.
Pseudocode	Yes	Algorithm 1 Asynchronous value iteration algorithm for computing optimal subtask policies. Algorithm 2 Robust Option Soft Actor Critic. Algorithm 3 Asynchronous Robust Option SAC.
Open Source Code	Yes	Our implementation is available online and can be found at https://github.com/keyshor/rosac.
Open Datasets	No	The paper mentions the "F1/10th environment" and cites "F110. F1/10 Autonomous Racing Competition. http://f1tenth.org" which is a simulator, not a specific training dataset with access details. The "Rooms environment" appears to be custom-built and no access is provided.
Dataset Splits	No	The paper describes evaluation against adversaries and mentions subtask sequences, but it does not specify explicit numerical training/validation/test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	All experiments were run on a 48-core machine with 512GB of memory and 8 GPUs. This is a general description, but it does not specify the exact models of the GPUs or CPUs.
Software Dependencies	No	The paper mentions specific optimizers and algorithms (e.g., Adam optimizer, SAC, DDPG, PPO, REINFORCE) but does not provide specific version numbers for any of these software components or underlying libraries (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	The hidden dimension used is 64 for all approaches except MADDPG for which we use 128 dimensional hidden layers. For DAGGER, NAIVE and AROSAC we run SAC with Adam optimizer (learning rate of α = 0.01), entropy weight β = 0.05, Polyac rate 0.005 and batch size of 100. In each iteration of AROSAC and DAGGER, SAC is run for N = 10000 steps. Similarly, ROSAC is run with Adam optimizer (learning rates αψ = αθ = 0.01), entropy weight β = 0.05, Polyac rate 0.005 and batch size of 300. The MADDPG baseline uses a learning rate of 0.0003 and batch size of 256. PAIRED uses PPO with a learning rate of 0.02, batch size of 512, minibatch size of 128 and 4 epochs for each policy update. The adversary is trained using REINFORCE with a learning rate of 0.003.