reproducibilityindex.ai

Encoding Human Domain Knowledge to Warm Start Reinforcement Learning

Authors: Andrew Silva, Matthew Gombolay5042-5050

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically validate our approach on two Open AI Gym tasks and two modiﬁed Star Craft 2 tasks, showing that our novel architecture outperforms multilayer-perceptron and recurrent architectures.
Researcher Affiliation	Academia	Andrew Silva, Matthew Gombolay Institute for Robotics and Intelligent Machines, Georgia Institute of Technology andrew.silva@gateolay@cc.gatech.edu
Pseudocode	Yes	Algorithm 1 Intelligent Initialization; Algorithm 2 Dynamic Growth
Open Source Code	Yes	Code for our implementation and experiments is available at https://github.com/CORE-Robotics-Lab/Pro Lo Nets
Open Datasets	Yes	Open AI Gym (Brockman et al. 2016) lunar lander and cart pole environments. Star Craft II (SC2) for macro and micro battles as well as the Open AI Gym (Brockman et al. 2016) lunar lander and cart pole environments...the Find And Defeat Zerglings minigame from the SC2LE (Vinyals et al. 2017)
Dataset Splits	No	The paper describes the environments used (Open AI Gym, Star Craft II) and evaluates agent performance, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification	No	The paper does not explicitly mention any specific hardware components (e.g., GPU models, CPU types, memory) used for running its experiments.
Software Dependencies	No	The paper mentions software like Proximal Policy Optimization (PPO), MLP, LSTM, and SC2 API, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	No	All agents are updated using Proximal Policy Optimization (PPO) (Schulman et al. 2017), with policy updates after each episode. Additional implementation details are available in the appendix. The paper mentions architectural choices (e.g., '1-layer architectures', '7-layer architecture') and training strategies ('update is rolled back'), but specific hyperparameter values (e.g., learning rate, batch size) are not provided in the main text.