reproducibilityindex.ai

Developing Bug-Free Machine Learning Systems With Formal Mathematics

Authors: Daniel Selsam, Percy Liang, David L. Dill

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To demonstrate practical feasibility empirically, we trained an Auto-Encoding Variational Bayes (AEVB) model (Kingma & Welling, 2014) on MNIST using ADAM (Kingma & Ba, 2014) and found the performance comparable to training the same model in Tensor Flow. ... Certigrad is efﬁcient. As an experiment, we trained an AEVB model with a 2-layer encoding network and a 2-layer decoding network on MNIST using the optimization procedure ADAM (Kingma & Ba, 2014), and compared both the expected loss and the running time of our system at each iteration against the same model and optimization procedure in Tensor Flow, both running on 2 CPU cores. We ﬁnd that our expected losses decrease at the same rate, and that while Certigrad takes 25% longer for the ﬁrst few iterations, its performance is more stable over time and it eventually surpasses Tensor Flow (Figure 6).
Researcher Affiliation	Academia	Daniel Selsam 1 Percy Liang 1 David L. Dill 1 ... 1Stanford University, Stanford, CA.
Pseudocode	Yes	We can write standard functional programs in Lean, such as softplus: def splus (x : R) : R := log (1 + exp x) ... Suppose we try to write a program to compute the gradient of the softplus function as follows: def gsplus (x : R) : R := 1 / (1 + exp x)
Open Source Code	Yes	The complete development can be found at www.github.com/ dselsam/certigrad.
Open Datasets	Yes	we trained an Auto-Encoding Variational Bayes (AEVB) model (Kingma & Welling, 2014) on MNIST using ADAM (Kingma & Ba, 2014)
Dataset Splits	No	The paper mentions training on MNIST but does not specify explicit training/validation/test splits or the use of a validation set.
Hardware Specification	No	The paper mentions running experiments on '2 CPU cores' but does not specify the CPU model, manufacturer, clock speed, or any other specific hardware details.
Software Dependencies	No	The paper mentions several software systems (Lean Theorem Prover, Eigen, TensorFlow, Theano) and cites their initial publication years. While the Eigen bibliography entry specifies 'Eigen v3', specific version numbers for all key software dependencies used in the experiments (e.g., TensorFlow version, Lean version) are not provided in the main text.
Experiment Setup	No	The paper states that an AEVB model with a '2-layer encoding network and a 2-layer decoding network' was trained on MNIST using the 'ADAM (Kingma & Ba, 2014)' optimization procedure. However, it does not provide specific hyperparameter values such as learning rate, batch size, or number of epochs.