Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Control Self-Assembling Morphologies: A Study of Generalization via Modularity

Authors: Deepak Pathak, Christopher Lu, Trevor Darrell, Phillip Isola, Alexei A. Efros

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the performance of these dynamic and modular agents in simulated environments. We demonstrate better generalization to test-time changes both in the environment, as well as in the structure of the agent, compared to static and monolithic baselines.
Researcher Affiliation Academia Deepak Pathak UC Berkeley Chris Lu UC Berkeley Trevor Darrell UC Berkeley Phillip Isola MIT Alexei A. Efros UC Berkeley
Pseudocode Yes DGN pseudo-code (as well as source code) and all training implementation details and are in Section 1.1,1.4 of the supplementary.
Open Source Code Yes Project video and code are available at https://pathak22.github.io/modular-assemblies/.
Open Datasets No The paper states that the authors created their own environments because existing benchmarks did not support their research needs. No specific public dataset is used or provided with access information for training.
Dataset Splits No The paper does not explicitly provide specific training/test/validation dataset splits (percentages or counts) or refer to predefined validation splits with citations.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. It only mentions the Unity ML framework.
Software Dependencies No The paper mentions "Unity ML" and "Mujoco gym environments" but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Across all the tasks, the number of limbs at training is kept fixed to 6. At test, we report the mean reward across 50 episodes of 1200 environment steps. The reward function for locomotion is defined as the distance covered by the agent along X-axis. Limbs start each episode disconnected and located just above the ground plane at random locations.