Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

From End-to-end to Step-by-step: Learning to Abstract via Abductive Reinforcement Learning

Authors: Zilong Wang, Jiongda Wang, Xiaoyong Chen, Meng Wang, Ming Ma, ZhiPeng Wang, Zhenyu Zhou, Tianming Yang, Wang-Zhou Dai

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments show that A2RL can mitigate the delayed reward problem and improve the generalization capability compared to traditional end-to-end RL methods. We conducted experiments on two sets of benchmark tasks, and the results showed that A2RL effectively learned the abstract structure, improving performance and easing the challenge of learning from delayed feedback.
Researcher Affiliation	Academia	The affiliations listed are: National Key Laboratory for Novel Software Technology, Nanjing University, China; School of Intelligence Science and Technology, Nanjing University, China; Institute of Neuroscience, State Key Laboratory of Brain Cognition and Brain-inspired Intelligence Technology, Center for Excellence in Brain Science and Intelligence Technology, Chinese Academy of Sciences, Shanghai, 200031, China; University of Chinese Academy of Sciences, School of Future Technology, Beijing, 100049, China. All these are academic or public research institutions, and the email domains (`.edu.cn`, `.ac.cn`) confirm an academic setting.
Pseudocode	No	The paper describes the A2RL framework textually and with conceptual diagrams (e.g., Figure 3), but it does not include a structured pseudocode block or algorithm.
Open Source Code	Yes	The code is available at https://github.com/sporeking/A2RL.
Open Datasets	Yes	Experiments are conducted in two types of benchmark environments having delayed reward feedback: (1) Minigrid [Chevalier-Boisvert et al., 2023]: Minigrid, implemented in Gymnasium [Towers et al., 2024], is a suite of easily configurable grid world environments specifically designed for RL research. (2) Taxi [Dietterich, 2000]: In Taxi, the taxi must pick up the passenger, drive to the destination, and drop them off to end the episode.
Dataset Splits	Yes	We replicated the setup in Section 5.2 but trained agents on procedurally generated random maps throughout the entire curriculum, and subsequently evaluated and finetuned on 50 unseen maps of comparable difficulty after training.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for conducting the experiments.
Software Dependencies	No	The paper mentions software like Minigrid (implemented in Gymnasium), PPO2, and D3QN, but it does not specify version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	No	The paper states total training steps (e.g., 300k, 600k, 900k) and describes the use of PPO2 and D3QN algorithms, but it does not provide specific hyperparameter values such as learning rates, batch sizes, or optimizer settings, which are crucial for reproducing the experiments.