Master-Slave Curriculum Design for Reinforcement Learning
Authors: Yuechen Wu, Wei Zhang, Ke Song
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluation on the Viz Doom platform demonstrates the joint learning of master agent and slave agents mutually benefit each other. Significant improvement is obtained over A3C in terms of learning speed and performance. |
| Researcher Affiliation | Academia | Yuechen Wu, Wei Zhang , Ke Song School of Control Science and Engineering, Shandong University {wuyuechen, songke vsislab}@mail.sdu.edu.cn, davidzhangsdu@gmail.com |
| Pseudocode | Yes | Algorithm 1: Master-Slave Curriculum Learning |
| Open Source Code | No | The paper does not provide an explicit statement or a link to open-source code for the described methodology. |
| Open Datasets | No | The paper states, "Evaluation is conducted on the Viz Doom platform" and describes several scenarios. While Viz Doom is a known platform, the paper does not specify a publicly available dataset with concrete access information (link, citation with authors/year) used for training specific to their experiments. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits with specific percentages, sample counts, or references to predefined splits needed for reproduction. It refers to 'n-step returns' and 'tmax steps' which are related to the update process, not data splitting. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions "RMSProp was performed to optimize the network in Tensor Flow." but does not provide specific version numbers for TensorFlow or any other software dependencies. |
| Experiment Setup | Yes | For all experiments, we set the discount factor γ = 0.99, the RMSProp decay factor α = 0.99, the exploration rate ϵ = 0.1, and the entropy regularization term β = 0.01. ... In the experiment, we used 16 threads and performed updates after every 80 actions (i.e., tmax = 20 and m = 4). |