What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes
Authors: Herman Yau, Chris Russell, Simon Hadfield
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate our method on multiple reinforcement learning problems, and provide code1 to help researchers introspecting their RL environments and algorithms. ... We evaluate our approach on three standard environments using Open AI Gym [8] Blackjack, Cartpole [7] and Taxi [11]. |
| Researcher Affiliation | Collaboration | Herman Yau CVSSP, University of Surrey Chris Russell Amazon Web Services Simon Hadfield CVSSP, University of Surrey |
| Pseudocode | No | The paper describes update rules mathematically (e.g., Equation 7) but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | We demonstrate our method on multiple reinforcement learning problems, and provide code1 to help researchers introspecting their RL environments and algorithms. 1https://github.com/hmhyau/rl-intention |
| Open Datasets | Yes | We evaluate our approach on three standard environments using Open AI Gym [8] Blackjack, Cartpole [7] and Taxi [11]. |
| Dataset Splits | No | The paper does not explicitly provide specific dataset split information (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions Open AI Gym and Adam optimizer, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | The agent is trained with α = 0.1, γ = 1 for 500k episodes. ... The vanilla Q-learning agent is trained with learning rate α = 0.1 and γ = 1. The DQN agent is trained using Adam [17] with gradient clipping at [ 1, 1] and α = 0.0001, γ = 1... The Q-learning agent is trained with α = 0.4, γ = 1... DQN used α = 0.0001, γ = 1 for 100k episodes. |