Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploratory Combinatorial Optimization with Reinforcement Learning

Authors: Thomas Barrett, William Clements, Jakob Foerster, Alex Lvovsky3243-3250

AAAI 2020 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem.
Researcher Affiliation Collaboration 1University of Oxford, Oxford, UK 2indust.ai, Paris, France 3Facebook AI Research 4Russian Quantum Center, Moscow, Russia
Pseudocode Yes Algorithm 1: Q-learning for ECO-DQN
Open Source Code Yes Source code, including experimental scripts and all testing and validation graphs, can be found at https://github.com/tomdbar/eco-dqn.
Open Datasets Yes In this work we train and test the agent on both Erd os-R enyi (Erd os and R enyi 1960) and Barabasi-Albert (Albert and Barab asi 2002) graphs with edges wij {0, 1} [...] Source code, including experimental scripts and all testing and validation graphs, can be found at https://github.com/tomdbar/eco-dqn. [...] Finally, we test ECO-DQN on publicly available datasets. The Physics dataset [...] The GSet is a well-investigated benchmark collection of graphs
Dataset Splits Yes The performance over training (i.e. all learning curves) is evaluated as the mean of a fixed set of 50 held-out graphs from the same distribution. Once trained, the agents are tested on a separate set of 100 held-out validation graphs from a given distribution.
Hardware Specification Yes Experiments were performed on NVIDIA Tesla M60 GPUs.
Software Dependencies No The paper mentions software like 'Network X Python package' and 'CPLEX' but does not specify their version numbers, which is required for reproducible software dependency information.
Experiment Setup Yes All agents are trained with a minibatch sizes of 64 and k=32 actions per step of gradient descent. The learning rate is 10 4 and the exploration rate is linearly decreased from ε=1 to ε=0.05 over the first 10 % of training. In this work we use n=64 dimensional embedding vectors, and have K=3 rounds of message passing.