Graph-based Deterministic Policy Gradient for Repetitive Combinatorial Optimization Problems
Authors: Zhongyuan Zhao, Ananthram Swami, Santiago Segarra
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through numerical experiments on several discrete optimization problems, we show that our approach can learn reusable policies to reduce the optimality gap of fast (distributed) heuristics for independent repetitive COPs, and can optimize the long-term objectives for repetitive COPs embedded in graph-based Markov decision processes. |
| Researcher Affiliation | Collaboration | Zhongyuan Zhao Rice University zhongyuan.zhao@ rice.edu Ananthram Swami DEVCOM Army Research Laboratory ananthram.swami.civ@army.mil Santiago Segarra Rice University segarra@rice.edu |
| Pseudocode | Yes | Algorithm 1 GDPG-Twin for R-COPs with Independent Instances |
| Open Source Code | Yes | Source code at https://github.com/Xzr TGMu/twin-nphard. |
| Open Datasets | Yes | For MWIS problem in Section 4.1, the test set of 500 ER graphs and the corresponding optimal solutions are from https://github.com/zhongyuanzhao/distgcn (Zhao et al., 2022a). |
| Dataset Splits | No | During training, random weights are generated on-the-fly for a training graph, whereas the weights on a validation graph are unchanged. |
| Hardware Specification | Yes | The training takes about an hour on a server with 16 GB memory, 8 CPUs and 1 GPU (Nvidia 1080 Ti). |
| Software Dependencies | No | It should be noted that these runtimes are based on our Python code, of which the implementation could be further optimized for running speed. |
| Experiment Setup | Yes | For the four demonstrated R-COPs with independent instances, we set the hyperparameters of training procedure in Algorithm 1 as follows: αp = αc = 0.0001, E = 25, B = 100, ϵ = 0.15. |