reproducibilityindex.ai

Search-based Reinforcement Learning through Bandit Linear Optimization

Authors: Milan Peelman, Antoon Bronselaer, Guy De Tré

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 ExperimentsIn our experiments we are interested in the comparison of three algorithms. In Figure 2a we can see the (relative) Elo rating of the three algorithms when each algorithm gets 50 search iterations per turn.
Researcher Affiliation	Academia	Milan Peelman , Antoon Bronselaer , Guy De Tr e Ghent University milan.peelman@ugent.be
Pseudocode	No	No explicit pseudocode or algorithm blocks are provided in the paper.
Open Source Code	Yes	Pod Pursuit 3 Implementation at: https://github.com/mpeelm/Pod-Pursuit
Open Datasets	Yes	Pod Pursuit 3 Implementation at: https://github.com/mpeelm/Pod-Pursuit
Dataset Splits	No	Each algorithm uses the outcomes and computed policies from self-play games to update an MLP with three hidden layers with dimension 88 and Re LU as activation function. - No mention of validation splits
Hardware Specification	No	The game is simple enough to enable successful training on high-end consumer hardware - This is a vague statement and does not provide specific hardware details.
Software Dependencies	No	Each algorithm uses the outcomes and computed policies from self-play games to update an MLP with three hidden layers with dimension 88 and Re LU as activation function. The optimizer used is SGD with momentum (0.9) and a constant learning rate of 0.1. - While it mentions components like MLP, ReLU, and SGD, it does not provide specific software library names with version numbers.
Experiment Setup	Yes	The parameters for the noise are ϵ = 0.25 and α = 1. ... Each algorithm uses the outcomes and computed policies from self-play games to update an MLP with three hidden layers with dimension 88 and Re LU as activation function. The optimizer used is SGD with momentum (0.9) and a constant learning rate of 0.1. ... Lastly, we use a discount factor γ of 0.99 and the constant c in the definition of λN is set to 1. ... Table 1: Parameter values for Pod Pursuit