Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Multi-objective Differentiable Neural Architecture Search
Authors: Rhea Sukthanker, Arber Zela, Benedikt Staffler, Samuel Dooley, Josif Grabocka, Frank Hutter
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments involving up to 19 hardware devices and 3 different objectives demonstrate the effectiveness and scalability of our method. Finally, we show that, without any additional costs, our method outperforms existing MOO NAS methods across a broad range of qualitatively different search spaces and datasets, including Mobile Net V3 on Image Net-1k, an encoder-decoder transformer space for machine translation and a decoder-only space for language modelling. |
| Researcher Affiliation | Collaboration | 1 University of Freiburg, 2 Bosch Center for AI, 3 Meta, 4 University of Technology Nuremberg, 5 ELLIS Institute Tübingen |
| Pseudocode | Yes | Algorithm 1: MODNAS Data: Dtrain; Dvalid; Supernetwork; device features {dt}T t=1; Meta Hypernetwork HΦ; nr. of objectives M; Architect Λ; learning rates ξ1, ξ2. |
| Open Source Code | Yes | To facilitate reproducibility, we provide our code in https://github.com/automl/modnas. |
| Open Datasets | Yes | We evaluate MODNAS on 4 search spaces: (1) NAS-Bench-201 (Dong & Yang, 2020) with 19 devices and CIFAR-10 dataset; (2) Mobile Net V3 from Once-for-All (OFA) (Cai et al., 2020) with 12 devices and Image Net-1k dataset; (3) Hardware-Aware-Transformer (HAT) (Wang et al., 2020b) on the machine translation benchmark WMT 14 En-De across 3 different hardware devices; (4) HW-GPT-Bench (Sukthanker et al., 2024) a GPT-2 based search space used for language modeling on the Open Web Text (Gokaslan & Cohen, 2019) across 8 devices. |
| Dataset Splits | Yes | Ltrain t and Lvalid t are the vectors with all M loss functions evaluated on the train and validation splits of D, used in the lowerand upper-level problems of (4), respectively. |
| Hardware Specification | Yes | We run the MODNAS search (see Appendix D for more details on the search hyperparameters), as described in Algorithm 1, for 100 epochs (22 GPU hours on a single NVidia RTX2080Ti) and show the HV in Figure 3 of the evaluated Pareto front in comparison to the baselines, for which we allocate the same search time budget across all devices equivalent to the MODNAS search + evaluation. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | In Table 2, we show the search hyperparameters and their corresponding values we use to conduct our experiments with MODNAS. For the convolutional spaces we subtract a cosine similarity penalty from the scalarized loss following (Ruchte & Grabocka, 2021): |