Ranking Policy Decisions
Authors: Hadrien Pouget, Hana Chockler, Youcheng Sun, Daniel Kroening
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments on a diverse set of standard benchmarks demonstrate that pruned policies can perform on a level comparable to the original policies. |
| Researcher Affiliation | Collaboration | Hadrien Pouget University of Cambridge UK pougeth@gmail.com Hana Chockler causaLens and King s College London UK hana@causalens.com hana.chockler@kcl.ac.uk Youcheng Sun Queen s University Belfast UK youcheng.sun@qub.ac.uk Daniel Kroening Amazon UK daniel.kroening@magd.ox.ac.uk |
| Pseudocode | No | The paper describes the method in Section 3, but does not present a formal pseudocode block or algorithm. |
| Open Source Code | Yes | The code for reproducing our experiments is available on Git Hub3, and further examples are provided on the project website4. https://github.com/hadrien-pouget/Ranking-Policy-Decisions. Experiments done at commit c972414 |
| Open Datasets | Yes | We experimented in several environments. The first is Minigrid [7], a gridworld... We also used Cart Pole [4], the classic control problem... Finally, to test our ability to scale, we ran experiments with Atari games [4]. |
| Dataset Splits | No | The paper describes generating a 'test suite' of mutant executions for evaluating their ranking method, but it does not specify train/validation/test dataset splits in the conventional machine learning sense for model training. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models or types of computing resources used for experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., programming language versions, library versions, or specific solvers). |
| Experiment Setup | Yes | Overall, our algorithm has five (tunable) parameters: the size of the test suite |T (π)|, the passing condition C, the default action d, the mutation rate µ and the abstraction function α. ... In our experiments, we set the condition to be receive more than X reward" for some X R, and chose X to yield a balanced suite... In our experiments, we selected µ manually... Details about the state abstraction functions, policy training, hyperparameters, etc., are provided in the full version of this paper [22]. |