Algorithms or Actions? A Study in Large-Scale Reinforcement Learning
Authors: Anderson Rocha Tavares, Sivasubramanian Anbalagan, Leandro Soriano Marcolino, Luiz Chaimowicz
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present synthetic experiments to further study such systems. Finally, we propose a function approximation approach, demonstrating the effectiveness of learning over algorithms in real-time strategy games. |
| Researcher Affiliation | Academia | Anderson Rocha Tavares 1, Sivasubramanian Anbalagan 2, Leandro Soriano Marcolino2, Luiz Chaimowicz1 1 Computer Science Department Universidade Federal de Minas Gerais 2 School of Computing and Communications Lancaster University |
| Pseudocode | No | The paper describes algorithms textually and mathematically but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code of synthetic and µRTS experiments are available at: https://github.com/andertavares/syntheticmdps and https://github.com/Siva Anbalagan1/microrts FA, respectively. |
| Open Datasets | No | In this paper we use µRTS, a simplified RTS game developed for AI research2. We used the map bases Workers24 24. No specific dataset is cited or linked for training, rather the interaction occurs within the game environment. |
| Dataset Splits | No | No explicit mention of training/validation/test splits, percentages, or sample counts for reproduction. The training is described in terms of games played against opponents rather than specific dataset splits. |
| Hardware Specification | No | No specific hardware details (like CPU/GPU models, memory) are provided for running the experiments. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments. |
| Experiment Setup | Yes | We used the map bases Workers24 24 , and the best parametrization we found: α = 10 4, γ = 0.9, ϵ exponentially decaying from 0.2 against Puppet AB, Puppet MCTS and AHTN; and decaying from 0.1 for Naive MCTS and Strategy Tactics, after every game (decay rate 0.9984). All games have 3000 cycles at most, declared a draw on timeout. Rewards are -1, 0 or 1 for defeat, draw and victory, respectively. |