Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning to superoptimize programs
Authors: Rudy Bunel, Alban Desmaison, M. Pawan Kumar, Philip H.S. Torr, Pushmeet Kohli
ICLR 2017 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on benchmarks comprising of automatically generated as well as existing ( Hacker s Delight ) programs show that the proposed method is able to significantly outperform state of the art approaches for code super-optimization. |
| Researcher Affiliation | Collaboration | Rudy Bunel1, Alban Desmaison1, M. Pawan Kumar1,2 & Philip H.S. Torr1 1Department of Engineering Science University of Oxford 2Alan Turing Institute Oxford, UK EMAIL, EMAIL Pushmeet Kohli Microsoft Research Redmond, WA 98052, USA EMAIL |
| Pseudocode | Yes | Figure 5: Generative Model of a Transformation. The figure presents a detailed pseudocode block describing the `proposal` function and different move types. |
| Open Source Code | No | The paper states that its system is built on top of the Stoke super-optimizer and uses the Torch framework, but it does not provide any link or explicit statement about making its own source code available for the methodology described. |
| Open Datasets | Yes | The first is based on the Hacker s delight (Warren, 2002) corpus, a collection of twenty five bit-manipulation programs, used as benchmark in program synthesis (Gulwani et al., 2011; Jha et al., 2010; Schkufza et al., 2013). |
| Dataset Splits | No | The paper describes training and test sets by dividing the Hacker's Delight dataset into even-numbered tasks for training and odd-numbered tasks for evaluation, and also states, "We generate 600 of these programs, 300 that we use as a training set for the optimizer to learn over and 300 that we keep as a test set." However, it does not explicitly mention a distinct validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or cloud instance types used for running its experiments. |
| Software Dependencies | No | The paper mentions that the implementation uses the Torch framework (Collobert et al., 2011) and the Adam optimizer (Kingma & Ba, 2015), but it does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | The optimization is performed by stochastic gradient descent, using the Adam (Kingma & Ba, 2015) optimizer. For each estimate of the gradient, we draw 100 samples for our estimator. The values of the hyperparameters used are given in Appendix A. |