Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Encoding Human Domain Knowledge to Warm Start Reinforcement Learning
Authors: Andrew Silva, Matthew Gombolay5042-5050
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our approach on two Open AI Gym tasks and two modified Star Craft 2 tasks, showing that our novel architecture outperforms multilayer-perceptron and recurrent architectures. |
| Researcher Affiliation | Academia | Andrew Silva, Matthew Gombolay Institute for Robotics and Intelligent Machines, Georgia Institute of Technology andrew.silva@EMAIL |
| Pseudocode | Yes | Algorithm 1 Intelligent Initialization; Algorithm 2 Dynamic Growth |
| Open Source Code | Yes | Code for our implementation and experiments is available at https://github.com/CORE-Robotics-Lab/Pro Lo Nets |
| Open Datasets | Yes | Open AI Gym (Brockman et al. 2016) lunar lander and cart pole environments. Star Craft II (SC2) for macro and micro battles as well as the Open AI Gym (Brockman et al. 2016) lunar lander and cart pole environments...the Find And Defeat Zerglings minigame from the SC2LE (Vinyals et al. 2017) |
| Dataset Splits | No | The paper describes the environments used (Open AI Gym, Star Craft II) and evaluates agent performance, but it does not specify explicit training, validation, and test dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not explicitly mention any specific hardware components (e.g., GPU models, CPU types, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions software like Proximal Policy Optimization (PPO), MLP, LSTM, and SC2 API, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | No | All agents are updated using Proximal Policy Optimization (PPO) (Schulman et al. 2017), with policy updates after each episode. Additional implementation details are available in the appendix. The paper mentions architectural choices (e.g., '1-layer architectures', '7-layer architecture') and training strategies ('update is rolled back'), but specific hyperparameter values (e.g., learning rate, batch size) are not provided in the main text. |