reproducibilityindex.ai

Exploratory Combinatorial Optimization with Reinforcement Learning

Authors: Thomas Barrett, William Clements, Jakob Foerster, Alex Lvovsky3243-3250

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem.
Researcher Affiliation	Collaboration	1University of Oxford, Oxford, UK 2indust.ai, Paris, France 3Facebook AI Research 4Russian Quantum Center, Moscow, Russia
Pseudocode	Yes	Algorithm 1: Q-learning for ECO-DQN
Open Source Code	Yes	Source code, including experimental scripts and all testing and validation graphs, can be found at https://github.com/tomdbar/eco-dqn.
Open Datasets	Yes	In this work we train and test the agent on both Erd os-R enyi (Erd os and R enyi 1960) and Barabasi-Albert (Albert and Barab asi 2002) graphs with edges wij {0, 1} [...] Source code, including experimental scripts and all testing and validation graphs, can be found at https://github.com/tomdbar/eco-dqn. [...] Finally, we test ECO-DQN on publicly available datasets. The Physics dataset [...] The GSet is a well-investigated benchmark collection of graphs
Dataset Splits	Yes	The performance over training (i.e. all learning curves) is evaluated as the mean of a ﬁxed set of 50 held-out graphs from the same distribution. Once trained, the agents are tested on a separate set of 100 held-out validation graphs from a given distribution.
Hardware Specification	Yes	Experiments were performed on NVIDIA Tesla M60 GPUs.
Software Dependencies	No	The paper mentions software like 'Network X Python package' and 'CPLEX' but does not specify their version numbers, which is required for reproducible software dependency information.
Experiment Setup	Yes	All agents are trained with a minibatch sizes of 64 and k=32 actions per step of gradient descent. The learning rate is 10 4 and the exploration rate is linearly decreased from ε=1 to ε=0.05 over the ﬁrst 10 % of training. In this work we use n=64 dimensional embedding vectors, and have K=3 rounds of message passing.