Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Difference-of-Convex Functions Approach to Energy-Based Iterative Reasoning

Authors: Daniel Tschernutter, David Diego Castro, Maciej Kasiński

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	As such, our method offers a leap in computational efficiency, enabling faster inference with theoretical guarantees, and hence unlocking the potential of energy-based models for iterative reasoning applications. In addition, we achieve state-of-the-art or superior performance on continuous reasoning tasks, as demonstrated by our experiments on multiple benchmark datasets from continuous algorithmic reasoning.
Researcher Affiliation	Industry	Daniel Tschernutter Infermedica Graz, Austria EMAIL David Diego-Castro Infermedica Gothenburg, Sweden EMAIL Maciej Kasi nski Infermedica Wrocław, Poland EMAIL
Pseudocode	Yes	We now combine our derivations in Section 3 and Assumption 1 to define our algorithm for scalable energy learning via a batched DCA approach named DCAReasoner.4 A pseudocode is presented in Algorithm 1. Algorithm 1: DCAReasoner: Scalable Energy Learning via Batched DCA
Open Source Code	Yes	4https://github.com/Daniel Tschernutter/DCAReasoner An implementation of DCAReasoner is publicly available on Git Hub.
Open Datasets	Yes	We first evaluate our algorithm on five continuous algorithmic reasoning benchmark datasets from earlier research [20, 21] in Section 5.2. In particular, we make use of the symptom-to-diagnosis dataset for medical reasoning which is freely available on Hugging Face.5 It provides a training and test dataset consisting of short texts... 5https://huggingface.co/datasets/gretelai/symptom_to_diagnosis, the dataset is licensed under Apache 2.0
Dataset Splits	Yes	Each of them is evaluated once with the same level of difficulty, i.e., test cases are drawn from the training distribution, and once with a harder level of difficulty, in which test cases are drawn from a problem specific harder test distribution following [20, 21]. It provides a training (853 examples) and test (212 examples) dataset consisting of short texts... The authors provide a training dataset consisting of 1.8 million Sudoku puzzles as well as a validation set with 0.1 million Sudoku puzzles.
Hardware Specification	Yes	All experiments are performed on a n1-standard-2 Google cloud instance with 7.5GB RAM and two NVIDIA T4 GPUs.
Software Dependencies	No	The paper mentions using the Adam optimizer with a learning rate, and refers to specific code releases of baselines, but does not explicitly state version numbers for its own implementation's software dependencies (e.g., Python, PyTorch versions) in the provided text.
Experiment Setup	Yes	Model Specifications. For DCAReasoner we set the number of hidden units Nx = 8 and Ny = 4000 for all benchmark datasets. ... We set the convergence tolerance in Algorithm 1 to 10 5 and set a maximum of 30 DCA iterations. Training. For training, we use the Adam optimizer with a learning rate of 10 4 as suggested in [20, 21] for all models. We set the batch size to 512 and train each model for a fixed number of 10000 iterations, i.e., gradient steps. Hyperparameters are summarized in Table 3.