Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Convergent Functions, Divergent Forms
Authors: Hyeonseong Jeon, Ainaz Eftekhar, Aaron Walsman, Kuo-Hao Zeng, Ali Farhadi, Ranjay Krishna
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In all experiments, morphologies are evolved for locomotion on flat terrain (FT) using UNIMAL space. Locomotion is a universal and ubiquitous evolutionary pressure across species; it is task-agnostic, avoids overfitting to narrow objectives, and is easy to simulate and reward. We evaluate both the performance and diversity of the final evolved morphologies, comparing them to other evolution-based co-design methods as well as Quality-Diversity approaches (Sec. 4.1). |
| Researcher Affiliation | Collaboration | 1University of Washington 2Seoul National University 3Allen Institute for AI 4Kempner Institute at Harvard University |
| Pseudocode | Yes | Algorithm 1 : LOKI |
| Open Source Code | Yes | Code with instructions to reproduce the main results are available on the project website. |
| Open Datasets | Yes | We use UNIMAL [4], an expressive design space encompassing approximately 1018 unique morphologies with fewer than 10 limbs. |
| Dataset Splits | No | The paper discusses "training durations" and "test tasks" rather than specific dataset splits for training, validation, and testing of a static dataset. For instance, "For each method, the final set of N = 100 evolved morphologies (elites) is independently trained from scratch on each test task using MLP-based policies, with 5 random seeds and training durations of 5, 15, or 20 million steps depending on task difficulty." |
| Hardware Specification | Yes | We train a transformer-based VAE (4 layers, 4 heads, latent dimension H = 32) on these designs using a batch size of 4096, an initial learning rate of 10 4, and a single A40 GPU for 200 epochs. ... Training is distributed across six A40 GPUs, with each GPU handling 6 7 cluster-specific policies in parallel (more details in the Appendix H). |
| Software Dependencies | No | The paper refers to "Pytorch" in reference [65], but does not specify a version number used in the experiments. No other specific software dependencies with version numbers are mentioned in the experimental setup sections. |
| Experiment Setup | Yes | We train a transformer-based VAE (4 layers, 4 heads, latent dimension H = 32) on these designs using a batch size of 4096, an initial learning rate of 10 4, and a single A40 GPU for 200 epochs. ... Detailed hyperparameters are provided in Tab. 6 and Tab. 7. |