reproducibilityindex.ai

Master-Slave Curriculum Design for Reinforcement Learning

Authors: Yuechen Wu, Wei Zhang, Ke Song

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive evaluation on the Viz Doom platform demonstrates the joint learning of master agent and slave agents mutually beneﬁt each other. Signiﬁcant improvement is obtained over A3C in terms of learning speed and performance.
Researcher Affiliation	Academia	Yuechen Wu, Wei Zhang , Ke Song School of Control Science and Engineering, Shandong University {wuyuechen, songke vsislab}@mail.sdu.edu.cn, davidzhangsdu@gmail.com
Pseudocode	Yes	Algorithm 1: Master-Slave Curriculum Learning
Open Source Code	No	The paper does not provide an explicit statement or a link to open-source code for the described methodology.
Open Datasets	No	The paper states, "Evaluation is conducted on the Viz Doom platform" and describes several scenarios. While Viz Doom is a known platform, the paper does not specify a publicly available dataset with concrete access information (link, citation with authors/year) used for training specific to their experiments.
Dataset Splits	No	The paper does not explicitly provide training/validation/test dataset splits with specific percentages, sample counts, or references to predefined splits needed for reproduction. It refers to 'n-step returns' and 'tmax steps' which are related to the update process, not data splitting.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	The paper mentions "RMSProp was performed to optimize the network in Tensor Flow." but does not provide specific version numbers for TensorFlow or any other software dependencies.
Experiment Setup	Yes	For all experiments, we set the discount factor γ = 0.99, the RMSProp decay factor α = 0.99, the exploration rate ϵ = 0.1, and the entropy regularization term β = 0.01. ... In the experiment, we used 16 threads and performed updates after every 80 actions (i.e., tmax = 20 and m = 4).