Developing Bug-Free Machine Learning Systems With Formal Mathematics

Authors: Daniel Selsam, Percy Liang, David L. Dill

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To demonstrate practical feasibility empirically, we trained an Auto-Encoding Variational Bayes (AEVB) model (Kingma & Welling, 2014) on MNIST using ADAM (Kingma & Ba, 2014) and found the performance comparable to training the same model in Tensor Flow. ... Certigrad is efficient. As an experiment, we trained an AEVB model with a 2-layer encoding network and a 2-layer decoding network on MNIST using the optimization procedure ADAM (Kingma & Ba, 2014), and compared both the expected loss and the running time of our system at each iteration against the same model and optimization procedure in Tensor Flow, both running on 2 CPU cores. We find that our expected losses decrease at the same rate, and that while Certigrad takes 25% longer for the first few iterations, its performance is more stable over time and it eventually surpasses Tensor Flow (Figure 6).
Researcher Affiliation Academia Daniel Selsam 1 Percy Liang 1 David L. Dill 1 ... 1Stanford University, Stanford, CA.
Pseudocode Yes We can write standard functional programs in Lean, such as softplus: def splus (x : R) : R := log (1 + exp x) ... Suppose we try to write a program to compute the gradient of the softplus function as follows: def gsplus (x : R) : R := 1 / (1 + exp x)
Open Source Code Yes The complete development can be found at www.github.com/ dselsam/certigrad.
Open Datasets Yes we trained an Auto-Encoding Variational Bayes (AEVB) model (Kingma & Welling, 2014) on MNIST using ADAM (Kingma & Ba, 2014)
Dataset Splits No The paper mentions training on MNIST but does not specify explicit training/validation/test splits or the use of a validation set.
Hardware Specification No The paper mentions running experiments on '2 CPU cores' but does not specify the CPU model, manufacturer, clock speed, or any other specific hardware details.
Software Dependencies No The paper mentions several software systems (Lean Theorem Prover, Eigen, TensorFlow, Theano) and cites their initial publication years. While the Eigen bibliography entry specifies 'Eigen v3', specific version numbers for all key software dependencies used in the experiments (e.g., TensorFlow version, Lean version) are not provided in the main text.
Experiment Setup No The paper states that an AEVB model with a '2-layer encoding network and a 2-layer decoding network' was trained on MNIST using the 'ADAM (Kingma & Ba, 2014)' optimization procedure. However, it does not provide specific hyperparameter values such as learning rate, batch size, or number of epochs.