Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Training Agent for First-Person Shooter Game with Actor-Critic Curriculum Learning

Authors: Yuxin Wu, Yuandong Tian

ICLR 2017 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we show the training procedure (Sec. 5.1), evaluate our AIs with ablation analysis (Sec. 5.2) and Vi ZDoom AI Competition (Sec. 5.3).
Researcher Affiliation Collaboration Yuxin Wu Carnegie Mellon University EMAIL Yuandong Tian Facebook AI Research EMAIL
Pseudocode No The paper describes the model and training procedure in detail, but does not provide structured pseudocode or an algorithm block.
Open Source Code No The paper mentions "tensorpack" which has a GitHub link (5https://github.com/ppwwyyxx/tensorpack), but this is a framework used by the authors, not the specific code implementation for *this* paper's methodology. There is no explicit statement or link for the paper's own source code.
Open Datasets No The paper uses the Vi ZDoom platform and describes custom scenarios like "Flat Map" and "CIGTrack1", but does not provide any public access information (link, citation, repository) for the specific datasets or scenarios used in their experiments.
Dataset Splits No The paper mentions using "100 episodes" or "300 episodes" for evaluation but does not specify a formal train/validation/test split for a dataset.
Hardware Specification Yes The training procedure runs on Intel Xeon CPU E5-2680v2 at 2. 80GHz, and 2 Titan X GPUs.
Software Dependencies No Our training procedure is implemented with Tensor Flow [Abadi et al. (2016)] and tensorpack5. (No version numbers provided for TensorFlow or tensorpack.)
Experiment Setup Yes We use Adam [Kingma & Ba (2014)] with ǫ = 10 3 for training. Batch size is 128, discount factor γ = 0.99, learning rate α = 10 4 and the policy learning rate β = 0.08α.