reproducibilityindex.ai

Predicting Future Actions of Reinforcement Learning Agents

Authors: Stephen Chung, Scott Niekum, David Krueger

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper experimentally evaluates and compares the effectiveness of future action and event prediction for three types of RL agents: explicitly planning, implicitly planning, and non-planning. We conduct extensive experiments to address the above research questions.
Researcher Affiliation	Academia	Stephen Chung University of Cambridge Scott Niekum University of Massachusetts Amherst David Krueger Mila
Pseudocode	No	The paper describes algorithms and methods but does not include any explicitly labeled pseudocode or algorithm blocks with structured, code-like formatting.
Open Source Code	Yes	1Full code is available at https://github.com/stephen-chung-mh/predict_action. Code: The code used for these experiments is available at https://github.com/ stephen-chung-mh/predict_action and is based on the public code released in Thinker [6].
Open Datasets	No	The paper uses the Sokoban environment and describes how data was generated (50k transitions). While Sokoban is a known game, there is no specific URL, DOI, or citation provided for a pre-existing, publicly accessible dataset of Sokoban levels or generated transitions used for training.
Dataset Splits	Yes	We generate 50,000 training samples, 10,000 evaluation samples, and 10,000 testing samples using the trained agents.
Hardware Specification	Yes	Computational Resources: Each agent is trained using a single A100 GPU, with training time varying by algorithm.
Software Dependencies	No	The paper mentions general components like 'convolutional network', 'three-layer Transformer encoder', and 'Adam optimizer', but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup	Yes	We evaluate the performance of predictors with varying training data sizes: 1k, 2k, 5k, 10k, 20k, 50k. We utilize a batch size of 128 and an Adam optimizer with a learning rate of 0.0001. Training is halted when the validation loss fails to improve for 10 consecutive steps.