Ranking Policy Decisions

Authors: Hadrien Pouget, Hana Chockler, Youcheng Sun, Daniel Kroening

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on a diverse set of standard benchmarks demonstrate that pruned policies can perform on a level comparable to the original policies.
Researcher Affiliation Collaboration Hadrien Pouget University of Cambridge UK pougeth@gmail.com Hana Chockler causaLens and King s College London UK hana@causalens.com hana.chockler@kcl.ac.uk Youcheng Sun Queen s University Belfast UK youcheng.sun@qub.ac.uk Daniel Kroening Amazon UK daniel.kroening@magd.ox.ac.uk
Pseudocode No The paper describes the method in Section 3, but does not present a formal pseudocode block or algorithm.
Open Source Code Yes The code for reproducing our experiments is available on Git Hub3, and further examples are provided on the project website4. https://github.com/hadrien-pouget/Ranking-Policy-Decisions. Experiments done at commit c972414
Open Datasets Yes We experimented in several environments. The first is Minigrid [7], a gridworld... We also used Cart Pole [4], the classic control problem... Finally, to test our ability to scale, we ran experiments with Atari games [4].
Dataset Splits No The paper describes generating a 'test suite' of mutant executions for evaluating their ranking method, but it does not specify train/validation/test dataset splits in the conventional machine learning sense for model training.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models or types of computing resources used for experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., programming language versions, library versions, or specific solvers).
Experiment Setup Yes Overall, our algorithm has five (tunable) parameters: the size of the test suite |T (π)|, the passing condition C, the default action d, the mutation rate µ and the abstraction function α. ... In our experiments, we set the condition to be receive more than X reward" for some X R, and chose X to yield a balanced suite... In our experiments, we selected µ manually... Details about the state abstraction functions, policy training, hyperparameters, etc., are provided in the full version of this paper [22].