Encoding Human Domain Knowledge to Warm Start Reinforcement Learning

Authors: Andrew Silva, Matthew Gombolay5042-5050

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our approach on two Open AI Gym tasks and two modified Star Craft 2 tasks, showing that our novel architecture outperforms multilayer-perceptron and recurrent architectures.
Researcher Affiliation Academia Andrew Silva, Matthew Gombolay Institute for Robotics and Intelligent Machines, Georgia Institute of Technology andrew.silva@gateolay@cc.gatech.edu
Pseudocode Yes Algorithm 1 Intelligent Initialization; Algorithm 2 Dynamic Growth
Open Source Code Yes Code for our implementation and experiments is available at https://github.com/CORE-Robotics-Lab/Pro Lo Nets
Open Datasets Yes Open AI Gym (Brockman et al. 2016) lunar lander and cart pole environments. Star Craft II (SC2) for macro and micro battles as well as the Open AI Gym (Brockman et al. 2016) lunar lander and cart pole environments...the Find And Defeat Zerglings minigame from the SC2LE (Vinyals et al. 2017)
Dataset Splits No The paper describes the environments used (Open AI Gym, Star Craft II) and evaluates agent performance, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for reproducibility.
Hardware Specification No The paper does not explicitly mention any specific hardware components (e.g., GPU models, CPU types, memory) used for running its experiments.
Software Dependencies No The paper mentions software like Proximal Policy Optimization (PPO), MLP, LSTM, and SC2 API, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup No All agents are updated using Proximal Policy Optimization (PPO) (Schulman et al. 2017), with policy updates after each episode. Additional implementation details are available in the appendix. The paper mentions architectural choices (e.g., '1-layer architectures', '7-layer architecture') and training strategies ('update is rolled back'), but specific hyperparameter values (e.g., learning rate, batch size) are not provided in the main text.