Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Convergent Functions, Divergent Forms

Authors: Hyeonseong Jeon, Ainaz Eftekhar, Aaron Walsman, Kuo-Hao Zeng, Ali Farhadi, Ranjay Krishna

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In all experiments, morphologies are evolved for locomotion on flat terrain (FT) using UNIMAL space. Locomotion is a universal and ubiquitous evolutionary pressure across species; it is task-agnostic, avoids overfitting to narrow objectives, and is easy to simulate and reward. We evaluate both the performance and diversity of the final evolved morphologies, comparing them to other evolution-based co-design methods as well as Quality-Diversity approaches (Sec. 4.1).
Researcher Affiliation	Collaboration	1University of Washington 2Seoul National University 3Allen Institute for AI 4Kempner Institute at Harvard University
Pseudocode	Yes	Algorithm 1 : LOKI
Open Source Code	Yes	Code with instructions to reproduce the main results are available on the project website.
Open Datasets	Yes	We use UNIMAL [4], an expressive design space encompassing approximately 1018 unique morphologies with fewer than 10 limbs.
Dataset Splits	No	The paper discusses "training durations" and "test tasks" rather than specific dataset splits for training, validation, and testing of a static dataset. For instance, "For each method, the final set of N = 100 evolved morphologies (elites) is independently trained from scratch on each test task using MLP-based policies, with 5 random seeds and training durations of 5, 15, or 20 million steps depending on task difficulty."
Hardware Specification	Yes	We train a transformer-based VAE (4 layers, 4 heads, latent dimension H = 32) on these designs using a batch size of 4096, an initial learning rate of 10 4, and a single A40 GPU for 200 epochs. ... Training is distributed across six A40 GPUs, with each GPU handling 6 7 cluster-specific policies in parallel (more details in the Appendix H).
Software Dependencies	No	The paper refers to "Pytorch" in reference [65], but does not specify a version number used in the experiments. No other specific software dependencies with version numbers are mentioned in the experimental setup sections.
Experiment Setup	Yes	We train a transformer-based VAE (4 layers, 4 heads, latent dimension H = 32) on these designs using a batch size of 4096, an initial learning rate of 10 4, and a single A40 GPU for 200 epochs. ... Detailed hyperparameters are provided in Tab. 6 and Tab. 7.