Learning Iterative Reasoning through Energy Diffusion
Authors: Yilun Du, Jiayuan Mao, Joshua B. Tenenbaum
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that IRED outperforms existing methods in continuous-space reasoning, discrete-space reasoning, and planning tasks, particularly in more challenging scenarios. We show the effectiveness of IRED on three groups of tasks: continuous-space reasoning (e.g., matrix completion, inversion), discrete-space reasoning (e.g., Sodoku solving, graph connectivity prediction), and planning (e.g., finding paths on graphs). |
| Researcher Affiliation | Academia | Yilun Du 1 * Jiayuan Mao 1 * Joshua Tenenbaum 1 1MIT. Correspondence to: Yilun Du <yilundu@mit.edu>, Jiayuan Mao <jiayuanm@mit.edu>. |
| Pseudocode | Yes | We provide full pseudocode for training our approach in Section 3.4 with training following Algorithm 1 and inference following Algorithm 2. |
| Open Source Code | Yes | Code and visualizations are at https://energy-based-model.github.io/ired. |
| Open Datasets | Yes | We use the dataset from SAT-Net (Wang et al., 2019) as the training and standard test dataset. Our harder dataset is from RRN (Palm et al., 2018) which is a different Sudoku dataset where the number of given numbers is within [17, 34]. For Connectivity tasks, we generate random graphs using algorithms from Graves et al. (2016). |
| Dataset Splits | No | We aim to learn a neural network-based prediction model NNθ( ) which can generalize execution NNθ(x ) to a test distribution x RO , where x can be significantly larger and more challenging than the training data x X (e.g., of higher dimensions, or with larger number magnitudes), by leveraging a possibly increased computational budget. |
| Hardware Specification | Yes | Models were trained in approximately 2 hours on a single Nvidia RTX 2080 using a training batch size of 2048 and the Adam optimizer with learning rate 1e-4. |
| Software Dependencies | No | Models were trained in approximately 2 hours on a single Nvidia RTX 2080 using a training batch size of 2048 and the Adam optimizer with learning rate 1e-4. |
| Experiment Setup | Yes | Models were trained in approximately 2 hours on a single Nvidia RTX 2080 using a training batch size of 2048 and the Adam optimizer with learning rate 1e-4. For Sudoku, we train models for 50000 iterations using a single Nvidia RTX 2080 using a training batch size of 64 with the Adam optimizer with learning rate 1e-4. |