reproducibilityindex.ai

Ranking Policy Decisions

Authors: Hadrien Pouget, Hana Chockler, Youcheng Sun, Daniel Kroening

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on a diverse set of standard benchmarks demonstrate that pruned policies can perform on a level comparable to the original policies.
Researcher Affiliation	Collaboration	Hadrien Pouget University of Cambridge UK pougeth@gmail.com Hana Chockler causaLens and King s College London UK hana@causalens.com hana.chockler@kcl.ac.uk Youcheng Sun Queen s University Belfast UK youcheng.sun@qub.ac.uk Daniel Kroening Amazon UK daniel.kroening@magd.ox.ac.uk
Pseudocode	No	The paper describes the method in Section 3, but does not present a formal pseudocode block or algorithm.
Open Source Code	Yes	The code for reproducing our experiments is available on Git Hub3, and further examples are provided on the project website4. https://github.com/hadrien-pouget/Ranking-Policy-Decisions. Experiments done at commit c972414
Open Datasets	Yes	We experimented in several environments. The ﬁrst is Minigrid [7], a gridworld... We also used Cart Pole [4], the classic control problem... Finally, to test our ability to scale, we ran experiments with Atari games [4].
Dataset Splits	No	The paper describes generating a 'test suite' of mutant executions for evaluating their ranking method, but it does not specify train/validation/test dataset splits in the conventional machine learning sense for model training.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models or types of computing resources used for experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers (e.g., programming language versions, library versions, or specific solvers).
Experiment Setup	Yes	Overall, our algorithm has ﬁve (tunable) parameters: the size of the test suite \|T (π)\|, the passing condition C, the default action d, the mutation rate µ and the abstraction function α. ... In our experiments, we set the condition to be receive more than X reward" for some X R, and chose X to yield a balanced suite... In our experiments, we selected µ manually... Details about the state abstraction functions, policy training, hyperparameters, etc., are provided in the full version of this paper [22].