reproducibilityindex.ai

Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning

Authors: Seyed Kamyar Seyed Ghasemipour, Satoshi Kataoka, Byron David, Daniel Freeman, Shixiang Shane Gu, Igor Mordatch

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments, we highlight the importance of large-scale training, structured representations, contributions of multi-task vs. single-task learning, as well as the effects of curriculums, and discuss qualitative behaviors of trained agents.
Researcher Affiliation	Industry	1Google Research. Correspondence to: Seyed Kamyar Seyed Ghasemipour <kamyar@google.com>.
Pseudocode	Yes	Appendix C. Graph Neural Network Architecture. (Contains Python code blocks for the network architecture)
Open Source Code	No	Our accompanying project webpage can be found at: sites.google.com/view/learning-direct-assembly. (Upon visiting the link, it states 'Code coming soon!', indicating the code is not yet publicly available at the time of checking.)
Open Datasets	No	To specify the assembly task, we designed 165 blueprints (split into 141 train, 24 test) describing interesting structures to be built... (The paper states the authors designed the blueprints, but does not provide a link or specific citation for public access to this dataset.)
Dataset Splits	No	To specify the assembly task, we designed 165 blueprints (split into 141 train, 24 test)... (The paper mentions train and test splits, but no explicit validation split for the dataset is described.)
Hardware Specification	Yes	Unless otherwise speciﬁed, our agents are trained for 1 Billion environment timesteps, using 1 Nvidia V100 GPU for training, and 3000 preemptible CPUs for generating rollouts in the environment.
Software Dependencies	No	The key libraries used for training are Jax (Bradbury et al., 2018), Jraph (Godwin* et al., 2020), Haiku (Hennigan et al., 2020), and Acme (Hoffman et al., 2020). (Software names are mentioned but specific version numbers for these libraries are not provided.)
Experiment Setup	Yes	We train our agents using Proximal Policy Optimization (PPO) (Schulman et al., 2017) and Generalized Advantage Estimation (GAE) (Schulman et al., 2015), and follow the practical PPO training advice of (Andrychowicz et al., 2020a). ... Unless otherwise speciﬁed, our agents are trained for 1 Billion environment timesteps... Episodes are 100 environment steps long... Unless otherwise speciﬁed, we reset from training blueprints with probability 0.2. (Appendix C also details network hyperparameters like NUM_GN_GAT_LAYERS = 3, VALUE_MLP_LAYER_SIZES = [512, 512, 512], etc.)