reproducibilityindex.ai

Graph-based Deterministic Policy Gradient for Repetitive Combinatorial Optimization Problems

Authors: Zhongyuan Zhao, Ananthram Swami, Santiago Segarra

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through numerical experiments on several discrete optimization problems, we show that our approach can learn reusable policies to reduce the optimality gap of fast (distributed) heuristics for independent repetitive COPs, and can optimize the long-term objectives for repetitive COPs embedded in graph-based Markov decision processes.
Researcher Affiliation	Collaboration	Zhongyuan Zhao Rice University zhongyuan.zhao@ rice.edu Ananthram Swami DEVCOM Army Research Laboratory ananthram.swami.civ@army.mil Santiago Segarra Rice University segarra@rice.edu
Pseudocode	Yes	Algorithm 1 GDPG-Twin for R-COPs with Independent Instances
Open Source Code	Yes	Source code at https://github.com/Xzr TGMu/twin-nphard.
Open Datasets	Yes	For MWIS problem in Section 4.1, the test set of 500 ER graphs and the corresponding optimal solutions are from https://github.com/zhongyuanzhao/distgcn (Zhao et al., 2022a).
Dataset Splits	No	During training, random weights are generated on-the-fly for a training graph, whereas the weights on a validation graph are unchanged.
Hardware Specification	Yes	The training takes about an hour on a server with 16 GB memory, 8 CPUs and 1 GPU (Nvidia 1080 Ti).
Software Dependencies	No	It should be noted that these runtimes are based on our Python code, of which the implementation could be further optimized for running speed.
Experiment Setup	Yes	For the four demonstrated R-COPs with independent instances, we set the hyperparameters of training procedure in Algorithm 1 as follows: αp = αc = 0.0001, E = 25, B = 100, ϵ = 0.15.