reproducibilityindex.ai

Policy-Based Self-Competition for Planning Problems

Authors: Jonathan Pirnay, Quirin Göttl, Jakob Burger, Dominik Gerhard Grimm

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show the effectiveness of our approach in two well-known combinatorial optimization problems, the Traveling Salesman Problem and the Job-Shop Scheduling Problem. With only half of the simulation budget for search, GAZ PTP consistently outperforms all selected single-player variants of GAZ.
Researcher Affiliation	Academia	Jonathan Pirnay1,2, Quirin G ottl1, Jakob Burger1 & Dominik G. Grimm1,2 1Technical University of Munich, Campus Straubing for Biotechnology and Sustainability 2Weihenstephan-Triesdorf University of Applied Sciences
Pseudocode	Yes	Algorithm 1: GAZ Play-to-Plan (GAZ PTP) Training ... Algorithm 2: GAZ Greedy Scalar Training
Open Source Code	Yes	Our code in Py Torch (Paszke et al., 2017) is available on https://github.com/ grimmlab/policy-based-self-competition.
Open Datasets	Yes	We report the average performance of all GAZ variants on the 10,000 test instances of Kool et al. (2018)... We report results on instances of medium size 15 15, 20 20, and 30 20 of the well-known Taillard benchmark set (Taillard, 1993)...
Dataset Splits	Yes	During training, the model is evaluated periodically on a ﬁxed validation set of 100 states to determine the ﬁnal model.
Hardware Specification	No	The authors gratefully acknowledge the Leibniz Supercomputing Centre for providing computing time on its Linux-Cluster.
Software Dependencies	No	Our code in Py Torch (Paszke et al., 2017) is available on https://github.com/ grimmlab/policy-based-self-competition. The paper mentions PyTorch but does not provide a specific version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	We set the constants to cvisit = 50 and cscale = 1.0... We set the self-play parameter to γ = 0.2. We use Adam (Kingma & Ba, 2014) as an optimizer, with a constant learning rate of 10 4, sampling batches of size 256 at each training step. Gradients are clipped to unit L2-norm.