Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Collapsing Taylor Mode Automatic Differentiation
Authors: Felix Dangel, Tim Siebert, Marius Zeinhofer, Andrea Walther
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement our collapsing procedure and evaluate it on popular PDE operators, confirming it accelerates Taylor mode and outperforms nested backpropagation. ... We empirically demonstrate that collapsing Taylor mode accelerates standard Taylor mode. ... Here, we describe our implementation of the Taylor mode collapsing process and empirically validate its performance improvements on the previously discussed operators. ... Results. Figure 5 visualizes the growth in computational resources w.r.t. the batch size (exact) and random samples (stochastic) for fixed dimensions D. Runtime and memory increase linearly in both, as expected. We quantify the results by fitting linear functions and reporting their slopes (i.e., time and memory added per datum/sample) in table 1. |
| Researcher Affiliation | Academia | Felix Dangel Vector Institute Toronto, Canada EMAIL Tim Siebert Humboldt-Universität zu Berlin and Zuse Institute Berlin Berlin, Germany EMAIL Marius Zeinhofer ETH Zurich Zurich, Switzerland, EMAIL Andrea Walther Humboldt-Universität zu Berlin and Zuse Institute Berlin Berlin, Germany EMAIL |
| Pseudocode | No | The paper includes figures (B6, C7, C8) that visually illustrate steps of graph transformation or propagation, but these are not formatted as structured pseudocode or algorithm blocks with numbered steps typically found in an algorithm description. |
| Open Source Code | Yes | We implement a Taylor mode library2 for Py Torch [22] that realizes the graph simplifications with torch.fx [26]. ... 2Available at https://github.com/f-dangel/torch-jet. |
| Open Datasets | No | The paper does not mention or use any specific, named datasets (e.g., CIFAR-10, ImageNet). It evaluates the performance of derivative computation techniques on PDE operators implemented via MLPs, rather than on a dataset-driven learning task. |
| Dataset Splits | No | Since no external datasets are explicitly mentioned or used for traditional training/testing, the concept of dataset splits is not applicable to the experimental setup described in the paper. |
| Hardware Specification | Yes | We compare standard Taylor mode with collapsed Taylor mode and nested 1st-order AD on an Nvidia RTX 6000 GPU with 24 Gi B memory. |
| Software Dependencies | No | The paper mentions Py Torch [22], torch.fx [26], and JAX [3] as key software components used or referenced. However, it does not provide specific version numbers for these libraries, which is required for a reproducible software description. |
| Experiment Setup | Yes | As common for PINNs [e.g., 6, 27], we use a 5-layer MLP fθ : D 768 768 512 512 1 with tanh activations and trainable parameters θ, and compute the PDE operators on batches of size N. We measure three performance metrics: (1) runtime reports the smallest execution time of 50 repetitions. (2) Peak memory (non-differentiable) measures the maximum allocated GPU memory when computing the PDE operator s value (e.g., used in VMC [24]) inside a torch.no_grad context. (3) Peak memory (differentiable) is the maximum memory usage when computing the PDE operator inside a torch.enable_grad context, which allows backpropagation to θ (required for training PINNs, or alternative VMC works [30, 32]). |