Backpropagation with Callbacks: Foundations for Efficient and Expressive Differentiable Programming
Authors: Fei Wang, James Decker, Xilun Wu, Gregory Essertel, Tiark Rompf
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Evaluation and Case Studies. As shown in Figure 6, we compared Lantern with Tensor Flow and Py Torch (Dy Net implementation was only introduced for Tree LSTM for the benefit of autobatching). The training loss (not shown) in all architectures had similar decay, indicating that Lantern correctly implements backward propagation. We elected to only gauge the runtime of training loops, as that is the majority of computation. |
| Researcher Affiliation | Academia | Fei Wang Purdue University West Lafayette, IN 47906 wang603@purdue.edu James Decker Purdue University West Lafayette, IN 47906 decker31@purdue.edu Xilun Wu Purdue University West Lafayette, IN 47906 wu636@purdue.edu Grégory Essertel Purdue University West Lafayette, IN, 47906 gesserte@purdue.edu Tiark Rompf Purdue University West Lafayette, IN, 47906 tiark@purdue.edu |
| Pseudocode | Yes | Figure 2: Automatic Differentiation in Scala: reverse-mode AD by callbacks and operator overloading (left), and the grad function definition and use case (right). ... Figure 3: Program Transformation between direct style (left) and CPS (right). ... Figure 4: Automatic Differentiation in Scala: reverse-mode using delimited continuations with shift/reset operators (left), and grad function definition and use case (right). |
| Open Source Code | Yes | In this section, we validate our design by implementing and evaluating our prototypic framework, dubbed Lantern2. Lantern builds on the code in earlier sections, but supports handling tensor objects (multi-dimension arrays with common linear algebra operations such as element-wise operations with broadcasting, matrix multiplication, and convolution). ... 2https://github.com/feiwang3311/Lantern |
| Open Datasets | Yes | We would like to give extra attention to the evaluation of Tree LSTM, which is adapted from Sentiment Classification using the dataset from the Stanford Sentiment Treebank (Chuang, 2013) following the work of Tai et al. (2015). |
| Dataset Splits | No | For vanilla RNN and LSTM, we evaluated at batch size 20. The training time for Lantern in both cases is less compared with that of Py Torch, and comparable to that of Tensor Flow. For CNN, the evaluation was done at batch size 100... As such, both Lantern and Py Torch were run at batch size 1... The paper mentions batch sizes and training, but does not provide explicit training/validation/test dataset splits (e.g., percentages or sample counts) for reproduction. |
| Hardware Specification | Yes | 3All experiments were run using a single CPU on a cluster with Intel Xeon Platinum 8168 CPUs at 2.70GHz and 0.75 TB RAM per node. |
| Software Dependencies | No | While some operations are linked to the Open BLAS implementation, most operations are implemented as simple C++ loops. ... comparing with Py Torch, Tensor Flow, and Dy Net (Neubig et al., 2017). ... The paper mentions several software components like Open BLAS, C++, Py Torch, Tensor Flow, and Dy Net, but it does not provide specific version numbers for any of these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For vanilla RNN and LSTM, we evaluated at batch size 20. ... For CNN, the evaluation was done at batch size 100... As such, both Lantern and Py Torch were run at batch size 1... |