ProtoX: Explaining a Reinforcement Learning Agent via Prototyping
Authors: Ronilo Ragodos, Tong Wang, Qihang Lin, Xun Zhou
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct various experiments to test Proto X. Results show that Proto X achieved high fidelity to the original black-box agent while providing meaningful and understandable explanations. |
| Researcher Affiliation | Academia | Ronilo J. Ragodos Department of Business Analytics University of Iowa Iowa City, IA 52242 ronilo-ragodos@uiowa.edu Tong Wang Department of Business Analytics University of Iowa Iowa City, IA 52242 tong-wang@uiowa.edu Qihang Lin Department of Business Analytics University of Iowa Iowa City, IA 52242 qihang-lin@uiowa.edu Xun Zhou Department of Business Analytics University of Iowa Iowa City, IA 52242 xun-zhou@uiowa.edu |
| Pseudocode | Yes | Pseudocode and further details for our pre-training algorithm can be found in the supplementary material. |
| Open Source Code | Yes | Reproducibility Our code is available at https://github.com/rrags/Proto X. |
| Open Datasets | Yes | We use four video-game tasks from Open AI Gym, namely, Pong, Seaquest and two levels from Super Mario Bros[45]. |
| Dataset Splits | No | Both Proto X and Res Net-BC are trained with the behavior cloning algorithm using 30,000 state-action pairs obtained via an expert trained with PPO. ... For each game, we let the agent generate a test set of 10, 000 state-action pairs Dtest = {si, π (si)}i. |
| Hardware Specification | Yes | All experiments were done on a system with an RTX 3060Ti GPU, AMD Ryzen 7 3700X 8-Core Processor, with 32GB RAM |
| Software Dependencies | No | The paper mentions using 'stable-baselines3' and 'PPO models', but does not specify exact version numbers for these software packages or other key libraries like Python or PyTorch in the main text. |
| Experiment Setup | No | See the Appendix for hyperparameter settings and further details on our experimental design. |