Predicting Future Actions of Reinforcement Learning Agents

Authors: Stephen Chung, Scott Niekum, David Krueger

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This paper experimentally evaluates and compares the effectiveness of future action and event prediction for three types of RL agents: explicitly planning, implicitly planning, and non-planning. We conduct extensive experiments to address the above research questions.
Researcher Affiliation Academia Stephen Chung University of Cambridge Scott Niekum University of Massachusetts Amherst David Krueger Mila
Pseudocode No The paper describes algorithms and methods but does not include any explicitly labeled pseudocode or algorithm blocks with structured, code-like formatting.
Open Source Code Yes 1Full code is available at https://github.com/stephen-chung-mh/predict_action. Code: The code used for these experiments is available at https://github.com/ stephen-chung-mh/predict_action and is based on the public code released in Thinker [6].
Open Datasets No The paper uses the Sokoban environment and describes how data was generated (50k transitions). While Sokoban is a known game, there is no specific URL, DOI, or citation provided for a pre-existing, publicly accessible dataset of Sokoban levels or generated transitions used for training.
Dataset Splits Yes We generate 50,000 training samples, 10,000 evaluation samples, and 10,000 testing samples using the trained agents.
Hardware Specification Yes Computational Resources: Each agent is trained using a single A100 GPU, with training time varying by algorithm.
Software Dependencies No The paper mentions general components like 'convolutional network', 'three-layer Transformer encoder', and 'Adam optimizer', but does not provide specific version numbers for software libraries, frameworks (like PyTorch or TensorFlow), or programming languages used.
Experiment Setup Yes We evaluate the performance of predictors with varying training data sizes: 1k, 2k, 5k, 10k, 20k, 50k. We utilize a batch size of 128 and an Adam optimizer with a learning rate of 0.0001. Training is halted when the validation loss fails to improve for 10 consecutive steps.