Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

What Did You Think Would Happen? Explaining Agent Behaviour through Intended Outcomes

Authors: Herman Yau, Chris Russell, Simon Hadfield

NeurIPS 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate our method on multiple reinforcement learning problems, and provide code1 to help researchers introspecting their RL environments and algorithms. ... We evaluate our approach on three standard environments using Open AI Gym [8] Blackjack, Cartpole [7] and Taxi [11].
Researcher Affiliation Collaboration Herman Yau CVSSP, University of Surrey Chris Russell Amazon Web Services Simon Hadfield CVSSP, University of Surrey
Pseudocode No The paper describes update rules mathematically (e.g., Equation 7) but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes We demonstrate our method on multiple reinforcement learning problems, and provide code1 to help researchers introspecting their RL environments and algorithms. 1https://github.com/hmhyau/rl-intention
Open Datasets Yes We evaluate our approach on three standard environments using Open AI Gym [8] Blackjack, Cartpole [7] and Taxi [11].
Dataset Splits No The paper does not explicitly provide specific dataset split information (e.g., percentages or counts for training, validation, and test sets).
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions Open AI Gym and Adam optimizer, but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes The agent is trained with α = 0.1, γ = 1 for 500k episodes. ... The vanilla Q-learning agent is trained with learning rate α = 0.1 and γ = 1. The DQN agent is trained using Adam [17] with gradient clipping at [ 1, 1] and α = 0.0001, γ = 1... The Q-learning agent is trained with α = 0.4, γ = 1... DQN used α = 0.0001, γ = 1 for 100k episodes.