Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GradSign: Model Performance Inference with Theoretical Insights
Authors: Zhihao Zhang, Zhihao Jia
ICLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluation on seven NAS benchmarks across three training datasets shows that Grad Sign generalizes well to real-world neural networks and consistently outperforms state-of-the-art gradient-based methods for MPI evaluated by Spearman s ρ and Kendall s Tau. |
| Researcher Affiliation | Academia | Zhihao Zhang Carnegie Mellon University EMAIL Zhihao Jia Carnegie Mellon University EMAIL |
| Pseudocode | Yes | Algorithm 1: Grad Sign Result: Grad Sign score τf for a function class fθ Given S = {(xi, yi)}i [n], randomly select initialization point θ0; Initialize g[n, m]; for i = 1, 2, , n do for k = 1, 2, , m do g[i, k] = sign([ θl(fθ(xi), yi)|θ0]k) end end τf = P i g[i, k]|; return τf |
| Open Source Code | Yes | Code is available at https://github.com/Jack Fram/Grad Sign |
| Open Datasets | Yes | Evaluation on seven NAS benchmarks (i.e., NAS-Bench-101, NAS-Bench-201, and five design spaces of NDS) across three datasets (i.e., CIFAR-10, CIFAR-100, and Image Net16-120) |
| Dataset Splits | Yes | Table 5: Mean std accuracy evaluated on NAS-Bench-201. All results are averaged over 500 runs. All searches are conducted on CIFAR-10 while the selected architectures are evaluated on CIFAR-10, CIFAR-100, and Image Net16-120. N in parenthesis is the number of networks sampled in each run. |
| Hardware Specification | Yes | The hardwares we used were Amazon EC2 C5 instances with no GPU involved and p3 instance with one V100 Tensor Core GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch (Paszke et al., 2017) and Tensor Flow (Abadi et al., 2016)' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We use a randomly sampled subset with approximately 4500 architectures of the original search space and a batch size of 64 in this experiment. |