Developing Bug-Free Machine Learning Systems With Formal Mathematics
Authors: Daniel Selsam, Percy Liang, David L. Dill
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To demonstrate practical feasibility empirically, we trained an Auto-Encoding Variational Bayes (AEVB) model (Kingma & Welling, 2014) on MNIST using ADAM (Kingma & Ba, 2014) and found the performance comparable to training the same model in Tensor Flow. ... Certigrad is efficient. As an experiment, we trained an AEVB model with a 2-layer encoding network and a 2-layer decoding network on MNIST using the optimization procedure ADAM (Kingma & Ba, 2014), and compared both the expected loss and the running time of our system at each iteration against the same model and optimization procedure in Tensor Flow, both running on 2 CPU cores. We find that our expected losses decrease at the same rate, and that while Certigrad takes 25% longer for the first few iterations, its performance is more stable over time and it eventually surpasses Tensor Flow (Figure 6). |
| Researcher Affiliation | Academia | Daniel Selsam 1 Percy Liang 1 David L. Dill 1 ... 1Stanford University, Stanford, CA. |
| Pseudocode | Yes | We can write standard functional programs in Lean, such as softplus: def splus (x : R) : R := log (1 + exp x) ... Suppose we try to write a program to compute the gradient of the softplus function as follows: def gsplus (x : R) : R := 1 / (1 + exp x) |
| Open Source Code | Yes | The complete development can be found at www.github.com/ dselsam/certigrad. |
| Open Datasets | Yes | we trained an Auto-Encoding Variational Bayes (AEVB) model (Kingma & Welling, 2014) on MNIST using ADAM (Kingma & Ba, 2014) |
| Dataset Splits | No | The paper mentions training on MNIST but does not specify explicit training/validation/test splits or the use of a validation set. |
| Hardware Specification | No | The paper mentions running experiments on '2 CPU cores' but does not specify the CPU model, manufacturer, clock speed, or any other specific hardware details. |
| Software Dependencies | No | The paper mentions several software systems (Lean Theorem Prover, Eigen, TensorFlow, Theano) and cites their initial publication years. While the Eigen bibliography entry specifies 'Eigen v3', specific version numbers for all key software dependencies used in the experiments (e.g., TensorFlow version, Lean version) are not provided in the main text. |
| Experiment Setup | No | The paper states that an AEVB model with a '2-layer encoding network and a 2-layer decoding network' was trained on MNIST using the 'ADAM (Kingma & Ba, 2014)' optimization procedure. However, it does not provide specific hyperparameter values such as learning rate, batch size, or number of epochs. |