Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Architecture Search for Diverse Tasks
Authors: Junhong Shen, Misha Khodak, Ameet Talwalkar
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate DASH on ten tasks spanning a variety of application domains such as PDE solving, protein folding, and heart disease detection. |
| Researcher Affiliation | Academia | Junhong Shen Carnegie Mellon University EMAIL Mikhail Khodak Carnegie Mellon University EMAIL Ameet Talwalkar Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1 DASH |
| Open Source Code | Yes | Our code is made public at https://github.com/sjunhongshen/DASH. |
| Open Datasets | Yes | We evaluate the performance of DASH on diverse tasks using ten datasets from NAS-Bench-360 [4], a benchmark spanning multiple application domains, input dimensions, and learning objectives. |
| Dataset Splits | Yes | Each dataset is preprocessed and split using the NAS-Bench-360 script, with the training set being used for search, hyperparameter tuning, and retraining. Then, we evaluate the performance on a holdout validation set and select the configuration with the best validation score. |
| Hardware Specification | Yes | The entire DASH pipeline can be run on a single NVIDIA V100 GPU, which is also the system that we use to report the runtime cost. |
| Software Dependencies | No | The paper mentions software concepts like 'SGD optimizer' and 'Gumbel Softmax activation', but does not provide specific version numbers for any software dependencies (e.g., Python, PyTorch, CUDA). |
| Experiment Setup | Yes | We use the default SGD optimizer for the WRN backbone and fix the learning rate schedule as well as the gradient clipping threshold for every task. To normalize architecture parameters into a probability distribution, we adopt the soft Gumbel Softmax activation, similar to Xie et al. [18]. |