Encoding Human Domain Knowledge to Warm Start Reinforcement Learning
Authors: Andrew Silva, Matthew Gombolay5042-5050
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our approach on two Open AI Gym tasks and two modiļ¬ed Star Craft 2 tasks, showing that our novel architecture outperforms multilayer-perceptron and recurrent architectures. |
| Researcher Affiliation | Academia | Andrew Silva, Matthew Gombolay Institute for Robotics and Intelligent Machines, Georgia Institute of Technology andrew.silva@gateolay@cc.gatech.edu |
| Pseudocode | Yes | Algorithm 1 Intelligent Initialization; Algorithm 2 Dynamic Growth |
| Open Source Code | Yes | Code for our implementation and experiments is available at https://github.com/CORE-Robotics-Lab/Pro Lo Nets |
| Open Datasets | Yes | Open AI Gym (Brockman et al. 2016) lunar lander and cart pole environments. Star Craft II (SC2) for macro and micro battles as well as the Open AI Gym (Brockman et al. 2016) lunar lander and cart pole environments...the Find And Defeat Zerglings minigame from the SC2LE (Vinyals et al. 2017) |
| Dataset Splits | No | The paper describes the environments used (Open AI Gym, Star Craft II) and evaluates agent performance, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware components (e.g., GPU models, CPU types, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Proximal Policy Optimization (PPO), MLP, LSTM, and SC2 API, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | All agents are updated using Proximal Policy Optimization (PPO) (Schulman et al. 2017), with policy updates after each episode. Additional implementation details are available in the appendix. The paper mentions architectural choices (e.g., '1-layer architectures', '7-layer architecture') and training strategies ('update is rolled back'), but specific hyperparameter values (e.g., learning rate, batch size) are not provided in the main text. |