Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Skill-Driven Neurosymbolic State Abstractions
Authors: Alper Ahmetoglu, Steven James, Cameron Allen, Sam Lobel, David Abel, George Konidaris
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We develop algorithms for constructing the abstraction from data and for planning with it, and apply them to visual chainwalk and maze tasks. We then generalize these results to factored actions, which modify only some state variables in the ground MDP. We characterize the conditions under which this generates factored abstract states, provide an algorithm that constructs the corresponding abstraction, and apply it to a visual gridworld and Montezuma s Revenge, a long-horizon Atari task [15]. These results provide a powerful and principled framework for learning neurosymbolic abstract decision processes. |
| Researcher Affiliation | Academia | Alper Ahmetoglu Brown University Steven James University of the Witwatersrand Cameron Allen UC Berkeley Sam Lobel Brown University David Abel University of Edinburgh George Konidaris Brown University |
| Pseudocode | Yes | Algorithm 1 Constructing and Refining an Abstract MDP Algorithm 2 Compute State Error Algorithm 3 refine_state |
| Open Source Code | Yes | The source code is provided in the supplementary material with sufficient explanation to reproduce the results. |
| Open Datasets | Yes | We demonstrate our approach using a high-dimensional chainwalk domain of length 6 (Figure 3a). The agent is equipped with actions to move left or right to adjacent states, but with 5% probability, the subsequent state is selected uniformly at random. At each state, the agent observes a sampled 28 28 MNIST digit [42] (see Appendix A.1 for more detail). We apply the resulting neurosymbolic algorithm to a visual Miniworld [17] maze domain. We next apply the algorithm to the challenging Atari game Montezuma s Revenge [15]. The state space is given by the annotated RAM states [7] corresponding to 14 factors over 16 variables |
| Dataset Splits | No | Figure 4b shows that the abstract MDP achieves better performance with far fewer samples than DQN. In Figure 4c, we increased the observation resolution from 60 80 pixels to 120 160 and 180 240 pixels, while keeping the sample size fixed. We use an expert policy that completes the first room 10% of the time, and otherwise deviates from the plan and executes random actions. This privileged process can be replaced by an exploration module that visits abstract states with fewer samples (or higher transition errors) to expand the frontier of the abstract MDP by further refinement, which we leave as future work. |
| Hardware Specification | Yes | We used a GPU cluster to train multiple MSA networks in batch and build the abstract MDP locally on a laptop with Apple M1 Pro chip and 16 GB of unified memory. DQN models are trained locally on this same laptop and another with NVIDIA RTX 4070 GPU and Intel Core Ultra 9 Processor 185H. |
| Software Dependencies | No | We further compare the planning performance of the learned abstract MDP with the goal-conditioned Deep Q-Network (DQN) [48] using the Stable Baselines 3 implementation [51]. We used a two-layered multi-layer perceptron (MLP) with 64 hidden units and Re LU activations for the MSA encoder. We used the default structure of the DQN model that is provided in stable-baselines34 [51] with the only difference in the output dimensionality that depends on the number of actions in the domain. |
| Experiment Setup | Yes | Hyperparameter details are provided in the supplementary material together with the source code. Table 1: MSA hyperparameters Table 2: Alg. 1 hyperparameters Table 3: DQN hyperparameters |