Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Multi-objective Differentiable Neural Architecture Search

Authors: Rhea Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments involving up to 19 hardware devices and 3 different objectives demonstrate the effectiveness and scalability of our method. Finally, we show that, without any additional costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including Mobile Net V3 on Image Net-1k, an encoder-decoder transformer space for machine translation and a decoder-only space for language modelling.
Researcher Affiliation	Collaboration	1 University of Freiburg, 2 Bosch Center for AI, 3 Meta, 4 University of Technology Nuremberg, 5 ELLIS Institute Tübingen
Pseudocode	Yes	Algorithm 1: MODNAS Data: Dtrain; Dvalid; Supernetwork; device features {dt}T t=1; Meta Hypernetwork HΦ; nr. of objectives M; Architect Λ; learning rates ξ1, ξ2.
Open Source Code	Yes	To facilitate reproducibility, we provide our code in https://github.com/automl/modnas.
Open Datasets	Yes	We evaluate MODNAS on 4 search spaces: (1) NAS-Bench-201 (Dong & Yang, 2020) with 19 devices and CIFAR-10 dataset; (2) Mobile Net V3 from Once-for-All (OFA) (Cai et al., 2020) with 12 devices and Image Net-1k dataset; (3) Hardware-Aware-Transformer (HAT) (Wang et al., 2020b) on the machine translation benchmark WMT 14 En-De across 3 different hardware devices; (4) HW-GPT-Bench (Sukthanker et al., 2024) a GPT-2 based search space used for language modeling on the Open Web Text (Gokaslan & Cohen, 2019) across 8 devices.
Dataset Splits	Yes	Ltrain t and Lvalid t are the vectors with all M loss functions evaluated on the train and validation splits of D, used in the lowerand upper-level problems of (4), respectively.
Hardware Specification	Yes	We run the MODNAS search (see Appendix D for more details on the search hyperparameters), as described in Algorithm 1, for 100 epochs (22 GPU hours on a single NVidia RTX2080Ti) and show the HV in Figure 3 of the evaluated Pareto front in comparison to the baselines, for which we allocate the same search time budget across all devices equivalent to the MODNAS search + evaluation.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	In Table 2, we show the search hyperparameters and their corresponding values we use to conduct our experiments with MODNAS. For the convolutional spaces we subtract a cosine similarity penalty from the scalarized loss following (Ruchte & Grabocka, 2021):