Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Ranking Policy Decisions
Authors: Hadrien Pouget, Hana Chockler, Youcheng Sun, Daniel Kroening
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on a diverse set of standard benchmarks demonstrate that pruned policies can perform on a level comparable to the original policies. |
| Researcher Affiliation | Collaboration | Hadrien Pouget University of Cambridge UK EMAIL Hana Chockler causaLens and King s College London UK EMAIL EMAIL Youcheng Sun Queen s University Belfast UK EMAIL Daniel Kroening Amazon UK EMAIL |
| Pseudocode | No | The paper describes the method in Section 3, but does not present a formal pseudocode block or algorithm. |
| Open Source Code | Yes | The code for reproducing our experiments is available on Git Hub3, and further examples are provided on the project website4. https://github.com/hadrien-pouget/Ranking-Policy-Decisions. Experiments done at commit c972414 |
| Open Datasets | Yes | We experimented in several environments. The first is Minigrid [7], a gridworld... We also used Cart Pole [4], the classic control problem... Finally, to test our ability to scale, we ran experiments with Atari games [4]. |
| Dataset Splits | No | The paper describes generating a 'test suite' of mutant executions for evaluating their ranking method, but it does not specify train/validation/test dataset splits in the conventional machine learning sense for model training. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or types of computing resources used for experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., programming language versions, library versions, or specific solvers). |
| Experiment Setup | Yes | Overall, our algorithm has five (tunable) parameters: the size of the test suite |T (π)|, the passing condition C, the default action d, the mutation rate µ and the abstraction function α. ... In our experiments, we set the condition to be receive more than X reward" for some X R, and chose X to yield a balanced suite... In our experiments, we selected µ manually... Details about the state abstraction functions, policy training, hyperparameters, etc., are provided in the full version of this paper [22]. |