Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Skill-Driven Neurosymbolic State Abstractions

Authors: Alper Ahmetoglu, Steven James, Cameron Allen, Sam Lobel, David Abel, George Konidaris

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We develop algorithms for constructing the abstraction from data and for planning with it, and apply them to visual chainwalk and maze tasks. We then generalize these results to factored actions, which modify only some state variables in the ground MDP. We characterize the conditions under which this generates factored abstract states, provide an algorithm that constructs the corresponding abstraction, and apply it to a visual gridworld and Montezuma s Revenge, a long-horizon Atari task [15]. These results provide a powerful and principled framework for learning neurosymbolic abstract decision processes.
Researcher Affiliation	Academia	Alper Ahmetoglu Brown University Steven James University of the Witwatersrand Cameron Allen UC Berkeley Sam Lobel Brown University David Abel University of Edinburgh George Konidaris Brown University
Pseudocode	Yes	Algorithm 1 Constructing and Refining an Abstract MDP Algorithm 2 Compute State Error Algorithm 3 refine_state
Open Source Code	Yes	The source code is provided in the supplementary material with sufficient explanation to reproduce the results.
Open Datasets	Yes	We demonstrate our approach using a high-dimensional chainwalk domain of length 6 (Figure 3a). The agent is equipped with actions to move left or right to adjacent states, but with 5% probability, the subsequent state is selected uniformly at random. At each state, the agent observes a sampled 28 28 MNIST digit [42] (see Appendix A.1 for more detail). We apply the resulting neurosymbolic algorithm to a visual Miniworld [17] maze domain. We next apply the algorithm to the challenging Atari game Montezuma s Revenge [15]. The state space is given by the annotated RAM states [7] corresponding to 14 factors over 16 variables
Dataset Splits	No	Figure 4b shows that the abstract MDP achieves better performance with far fewer samples than DQN. In Figure 4c, we increased the observation resolution from 60 80 pixels to 120 160 and 180 240 pixels, while keeping the sample size fixed. We use an expert policy that completes the first room 10% of the time, and otherwise deviates from the plan and executes random actions. This privileged process can be replaced by an exploration module that visits abstract states with fewer samples (or higher transition errors) to expand the frontier of the abstract MDP by further refinement, which we leave as future work.
Hardware Specification	Yes	We used a GPU cluster to train multiple MSA networks in batch and build the abstract MDP locally on a laptop with Apple M1 Pro chip and 16 GB of unified memory. DQN models are trained locally on this same laptop and another with NVIDIA RTX 4070 GPU and Intel Core Ultra 9 Processor 185H.
Software Dependencies	No	We further compare the planning performance of the learned abstract MDP with the goal-conditioned Deep Q-Network (DQN) [48] using the Stable Baselines 3 implementation [51]. We used a two-layered multi-layer perceptron (MLP) with 64 hidden units and Re LU activations for the MSA encoder. We used the default structure of the DQN model that is provided in stable-baselines34 [51] with the only difference in the output dimensionality that depends on the number of actions in the domain.
Experiment Setup	Yes	Hyperparameter details are provided in the supplementary material together with the source code. Table 1: MSA hyperparameters Table 2: Alg. 1 hyperparameters Table 3: DQN hyperparameters