Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Iterative Reasoning through Energy Diffusion
Authors: Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks, particularly in more challenging scenarios. We show the effectiveness of IRED on three groups of tasks: continuous-space reasoning (e.g., matrix completion, inversion), discrete-space reasoning (e.g., Sodoku solving, graph connectivity prediction), and planning (e.g., finding paths on graphs). |
| Researcher Affiliation | Academia | Yilun Du 1 * Jiayuan Mao 1 * Joshua Tenenbaum 1 1MIT. Correspondence to: Yilun Du <EMAIL>, Jiayuan Mao <EMAIL>. |
| Pseudocode | Yes | We provide full pseudocode for training our approach in Section 3.4 with training following Algorithm 1 and inference following Algorithm 2. |
| Open Source Code | Yes | Code and visualizations are at https://energy-based-model.github.io/ired. |
| Open Datasets | Yes | We use the dataset from SAT-Net (Wang et al., 2019) as the training and standard test dataset. Our harder dataset is from RRN (Palm et al., 2018) which is a different Sudoku dataset where the number of given numbers is within [17, 34]. For Connectivity tasks, we generate random graphs using algorithms from Graves et al. (2016). |
| Dataset Splits | No | We aim to learn a neural network-based prediction model NNθ( ) which can generalize execution NNθ(x ) to a test distribution x RO , where x can be significantly larger and more challenging than the training data x X (e.g., of higher dimensions, or with larger number magnitudes), by leveraging a possibly increased computational budget. |
| Hardware Specification | Yes | Models were trained in approximately 2 hours on a single Nvidia RTX 2080 using a training batch size of 2048 and the Adam optimizer with learning rate 1e-4. |
| Software Dependencies | No | Models were trained in approximately 2 hours on a single Nvidia RTX 2080 using a training batch size of 2048 and the Adam optimizer with learning rate 1e-4. |
| Experiment Setup | Yes | Models were trained in approximately 2 hours on a single Nvidia RTX 2080 using a training batch size of 2048 and the Adam optimizer with learning rate 1e-4. For Sudoku, we train models for 50000 iterations using a single Nvidia RTX 2080 using a training batch size of 64 with the Adam optimizer with learning rate 1e-4. |