reproducibilityindex.ai

What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes

Authors: Herman Yau, Chris Russell, Simon Hadfield

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate our method on multiple reinforcement learning problems, and provide code1 to help researchers introspecting their RL environments and algorithms. ... We evaluate our approach on three standard environments using Open AI Gym [8] Blackjack, Cartpole [7] and Taxi [11].
Researcher Affiliation	Collaboration	Herman Yau CVSSP, University of Surrey Chris Russell Amazon Web Services Simon Hadﬁeld CVSSP, University of Surrey
Pseudocode	No	The paper describes update rules mathematically (e.g., Equation 7) but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	We demonstrate our method on multiple reinforcement learning problems, and provide code1 to help researchers introspecting their RL environments and algorithms. 1https://github.com/hmhyau/rl-intention
Open Datasets	Yes	We evaluate our approach on three standard environments using Open AI Gym [8] Blackjack, Cartpole [7] and Taxi [11].
Dataset Splits	No	The paper does not explicitly provide specific dataset split information (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies	No	The paper mentions Open AI Gym and Adam optimizer, but does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	The agent is trained with α = 0.1, γ = 1 for 500k episodes. ... The vanilla Q-learning agent is trained with learning rate α = 0.1 and γ = 1. The DQN agent is trained using Adam [17] with gradient clipping at [ 1, 1] and α = 0.0001, γ = 1... The Q-learning agent is trained with α = 0.4, γ = 1... DQN used α = 0.0001, γ = 1 for 100k episodes.