Blocks Assemble! Learning to Assemble with Large-Scale Structured Reinforcement Learning
Authors: Seyed Kamyar Seyed Ghasemipour, Satoshi Kataoka, Byron David, Daniel Freeman, Shixiang Shane Gu, Igor Mordatch
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we highlight the importance of large-scale training, structured representations, contributions of multi-task vs. single-task learning, as well as the effects of curriculums, and discuss qualitative behaviors of trained agents. |
| Researcher Affiliation | Industry | 1Google Research. Correspondence to: Seyed Kamyar Seyed Ghasemipour <kamyar@google.com>. |
| Pseudocode | Yes | Appendix C. Graph Neural Network Architecture. (Contains Python code blocks for the network architecture) |
| Open Source Code | No | Our accompanying project webpage can be found at: sites.google.com/view/learning-direct-assembly. (Upon visiting the link, it states 'Code coming soon!', indicating the code is not yet publicly available at the time of checking.) |
| Open Datasets | No | To specify the assembly task, we designed 165 blueprints (split into 141 train, 24 test) describing interesting structures to be built... (The paper states the authors designed the blueprints, but does not provide a link or specific citation for public access to this dataset.) |
| Dataset Splits | No | To specify the assembly task, we designed 165 blueprints (split into 141 train, 24 test)... (The paper mentions train and test splits, but no explicit validation split for the dataset is described.) |
| Hardware Specification | Yes | Unless otherwise specified, our agents are trained for 1 Billion environment timesteps, using 1 Nvidia V100 GPU for training, and 3000 preemptible CPUs for generating rollouts in the environment. |
| Software Dependencies | No | The key libraries used for training are Jax (Bradbury et al., 2018), Jraph (Godwin* et al., 2020), Haiku (Hennigan et al., 2020), and Acme (Hoffman et al., 2020). (Software names are mentioned but specific version numbers for these libraries are not provided.) |
| Experiment Setup | Yes | We train our agents using Proximal Policy Optimization (PPO) (Schulman et al., 2017) and Generalized Advantage Estimation (GAE) (Schulman et al., 2015), and follow the practical PPO training advice of (Andrychowicz et al., 2020a). ... Unless otherwise specified, our agents are trained for 1 Billion environment timesteps... Episodes are 100 environment steps long... Unless otherwise specified, we reset from training blueprints with probability 0.2. (Appendix C also details network hyperparameters like NUM_GN_GAT_LAYERS = 3, VALUE_MLP_LAYER_SIZES = [512, 512, 512], etc.) |