Graph-based Deterministic Policy Gradient for Repetitive Combinatorial Optimization Problems

Authors: Zhongyuan Zhao, Ananthram Swami, Santiago Segarra

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through numerical experiments on several discrete optimization problems, we show that our approach can learn reusable policies to reduce the optimality gap of fast (distributed) heuristics for independent repetitive COPs, and can optimize the long-term objectives for repetitive COPs embedded in graph-based Markov decision processes.
Researcher Affiliation Collaboration Zhongyuan Zhao Rice University zhongyuan.zhao@ rice.edu Ananthram Swami DEVCOM Army Research Laboratory ananthram.swami.civ@army.mil Santiago Segarra Rice University segarra@rice.edu
Pseudocode Yes Algorithm 1 GDPG-Twin for R-COPs with Independent Instances
Open Source Code Yes Source code at https://github.com/Xzr TGMu/twin-nphard.
Open Datasets Yes For MWIS problem in Section 4.1, the test set of 500 ER graphs and the corresponding optimal solutions are from https://github.com/zhongyuanzhao/distgcn (Zhao et al., 2022a).
Dataset Splits No During training, random weights are generated on-the-fly for a training graph, whereas the weights on a validation graph are unchanged.
Hardware Specification Yes The training takes about an hour on a server with 16 GB memory, 8 CPUs and 1 GPU (Nvidia 1080 Ti).
Software Dependencies No It should be noted that these runtimes are based on our Python code, of which the implementation could be further optimized for running speed.
Experiment Setup Yes For the four demonstrated R-COPs with independent instances, we set the hyperparameters of training procedure in Algorithm 1 as follows: αp = αc = 0.0001, E = 25, B = 100, ϵ = 0.15.