Exploratory Combinatorial Optimization with Reinforcement Learning
Authors: Thomas Barrett, William Clements, Jakob Foerster, Alex Lvovsky3243-3250
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we show our method to produce state-of-the-art RL performance on the Maximum Cut problem. |
| Researcher Affiliation | Collaboration | 1University of Oxford, Oxford, UK 2indust.ai, Paris, France 3Facebook AI Research 4Russian Quantum Center, Moscow, Russia |
| Pseudocode | Yes | Algorithm 1: Q-learning for ECO-DQN |
| Open Source Code | Yes | Source code, including experimental scripts and all testing and validation graphs, can be found at https://github.com/tomdbar/eco-dqn. |
| Open Datasets | Yes | In this work we train and test the agent on both Erd os-R enyi (Erd os and R enyi 1960) and Barabasi-Albert (Albert and Barab asi 2002) graphs with edges wij {0, 1} [...] Source code, including experimental scripts and all testing and validation graphs, can be found at https://github.com/tomdbar/eco-dqn. [...] Finally, we test ECO-DQN on publicly available datasets. The Physics dataset [...] The GSet is a well-investigated benchmark collection of graphs |
| Dataset Splits | Yes | The performance over training (i.e. all learning curves) is evaluated as the mean of a fixed set of 50 held-out graphs from the same distribution. Once trained, the agents are tested on a separate set of 100 held-out validation graphs from a given distribution. |
| Hardware Specification | Yes | Experiments were performed on NVIDIA Tesla M60 GPUs. |
| Software Dependencies | No | The paper mentions software like 'Network X Python package' and 'CPLEX' but does not specify their version numbers, which is required for reproducible software dependency information. |
| Experiment Setup | Yes | All agents are trained with a minibatch sizes of 64 and k=32 actions per step of gradient descent. The learning rate is 10 4 and the exploration rate is linearly decreased from ε=1 to ε=0.05 over the first 10 % of training. In this work we use n=64 dimensional embedding vectors, and have K=3 rounds of message passing. |