Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Graph Neural Network Based Action Ranking for Planning

Authors: Rajesh Mangannavar, Stefan Lee, Alan Fern, Prasad Tadepalli

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results across standard planning benchmarks demonstrate that our action-ranking approach not only achieves better generalization to larger problems than those used in training but also outperforms multiple baselines (value function and action ranking) methods in terms of success rate and plan quality.
Researcher Affiliation	Academia	Rajesh Mangannavar Oregon State University Corvallis, OR 97330, USA EMAIL
Pseudocode	Yes	Algorithm 1: Graph Attention-Based Action Ranking (GABAR) Procedure DECODER(g L, {v L i }i V, A, O, k) Algorithm 2: Beam Search Action Decoder
Open Source Code	Yes	Code : https://github.com/Learning-for-Seq-Decision-Making/GABAR-Graph-based-action-ranking-for-planning.
Open Datasets	Yes	All problems were generated using openly available PDDL-generators [21]. Dataset submitted as supplementary material
Dataset Splits	Yes	We divide the test set for each domain into 3 separate subsets, easy, medium, and hard with increasing difficulty with problem sizes as defined in table 1 along with the training and validation dataset sizes. Each test subset has 100 problems. In contrast, the training dataset consists of problems smaller and simpler than the ones found in the easy subset.
Hardware Specification	Yes	It takes between 1-2 hours to train a model for each domain on an RTX 3080.
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'GNN' but does not specify version numbers for these or any other software libraries or programming languages used.
Experiment Setup	Yes	For all domains, we train the model using the Adam optimizer with a learning rate of 0.0005, 9 rounds of GNN, and batch size of 16, hidden dimensionality of 64. Training proceeds for a maximum of 500 epochs, and we select the model checkpoint that achieves the lowest loss on the validation set for evaluation.