Algorithms or Actions? A Study in Large-Scale Reinforcement Learning

Authors: Anderson Rocha Tavares, Sivasubramanian Anbalagan, Leandro Soriano Marcolino, Luiz Chaimowicz

IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present synthetic experiments to further study such systems. Finally, we propose a function approximation approach, demonstrating the effectiveness of learning over algorithms in real-time strategy games.
Researcher Affiliation Academia Anderson Rocha Tavares 1, Sivasubramanian Anbalagan 2, Leandro Soriano Marcolino2, Luiz Chaimowicz1 1 Computer Science Department Universidade Federal de Minas Gerais 2 School of Computing and Communications Lancaster University
Pseudocode No The paper describes algorithms textually and mathematically but does not include structured pseudocode or algorithm blocks.
Open Source Code Yes The source code of synthetic and µRTS experiments are available at: https://github.com/andertavares/syntheticmdps and https://github.com/Siva Anbalagan1/microrts FA, respectively.
Open Datasets No In this paper we use µRTS, a simplified RTS game developed for AI research2. We used the map bases Workers24 24. No specific dataset is cited or linked for training, rather the interaction occurs within the game environment.
Dataset Splits No No explicit mention of training/validation/test splits, percentages, or sample counts for reproduction. The training is described in terms of games played against opponents rather than specific dataset splits.
Hardware Specification No No specific hardware details (like CPU/GPU models, memory) are provided for running the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes We used the map bases Workers24 24 , and the best parametrization we found: α = 10 4, γ = 0.9, ϵ exponentially decaying from 0.2 against Puppet AB, Puppet MCTS and AHTN; and decaying from 0.1 for Naive MCTS and Strategy Tactics, after every game (decay rate 0.9984). All games have 3000 cycles at most, declared a draw on timeout. Rewards are -1, 0 or 1 for defeat, draw and victory, respectively.