Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Collapsing Taylor Mode Automatic Differentiation

Authors: Felix Dangel, Tim Siebert, Marius Zeinhofer, Andrea Walther

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We implement our collapsing procedure and evaluate it on popular PDE operators, confirming it accelerates Taylor mode and outperforms nested backpropagation. ... We empirically demonstrate that collapsing Taylor mode accelerates standard Taylor mode. ... Here, we describe our implementation of the Taylor mode collapsing process and empirically validate its performance improvements on the previously discussed operators. ... Results. Figure 5 visualizes the growth in computational resources w.r.t. the batch size (exact) and random samples (stochastic) for fixed dimensions D. Runtime and memory increase linearly in both, as expected. We quantify the results by fitting linear functions and reporting their slopes (i.e., time and memory added per datum/sample) in table 1.
Researcher Affiliation Academia Felix Dangel Vector Institute Toronto, Canada EMAIL Tim Siebert Humboldt-Universität zu Berlin and Zuse Institute Berlin Berlin, Germany EMAIL Marius Zeinhofer ETH Zurich Zurich, Switzerland, EMAIL Andrea Walther Humboldt-Universität zu Berlin and Zuse Institute Berlin Berlin, Germany EMAIL
Pseudocode No The paper includes figures (B6, C7, C8) that visually illustrate steps of graph transformation or propagation, but these are not formatted as structured pseudocode or algorithm blocks with numbered steps typically found in an algorithm description.
Open Source Code Yes We implement a Taylor mode library2 for Py Torch [22] that realizes the graph simplifications with torch.fx [26]. ... 2Available at https://github.com/f-dangel/torch-jet.
Open Datasets No The paper does not mention or use any specific, named datasets (e.g., CIFAR-10, ImageNet). It evaluates the performance of derivative computation techniques on PDE operators implemented via MLPs, rather than on a dataset-driven learning task.
Dataset Splits No Since no external datasets are explicitly mentioned or used for traditional training/testing, the concept of dataset splits is not applicable to the experimental setup described in the paper.
Hardware Specification Yes We compare standard Taylor mode with collapsed Taylor mode and nested 1st-order AD on an Nvidia RTX 6000 GPU with 24 Gi B memory.
Software Dependencies No The paper mentions Py Torch [22], torch.fx [26], and JAX [3] as key software components used or referenced. However, it does not provide specific version numbers for these libraries, which is required for a reproducible software description.
Experiment Setup Yes As common for PINNs [e.g., 6, 27], we use a 5-layer MLP fθ : D 768 768 512 512 1 with tanh activations and trainable parameters θ, and compute the PDE operators on batches of size N. We measure three performance metrics: (1) runtime reports the smallest execution time of 50 repetitions. (2) Peak memory (non-differentiable) measures the maximum allocated GPU memory when computing the PDE operator s value (e.g., used in VMC [24]) inside a torch.no_grad context. (3) Peak memory (differentiable) is the maximum memory usage when computing the PDE operator inside a torch.enable_grad context, which allows backpropagation to θ (required for training PINNs, or alternative VMC works [30, 32]).