Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Graph-based Deterministic Policy Gradient for Repetitive Combinatorial Optimization Problems
Authors: Zhongyuan Zhao, Ananthram Swami, Santiago Segarra
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through numerical experiments on several discrete optimization problems, we show that our approach can learn reusable policies to reduce the optimality gap of fast (distributed) heuristics for independent repetitive COPs, and can optimize the long-term objectives for repetitive COPs embedded in graph-based Markov decision processes. |
| Researcher Affiliation | Collaboration | Zhongyuan Zhao Rice University zhongyuan.zhao@ rice.edu Ananthram Swami DEVCOM Army Research Laboratory EMAIL Santiago Segarra Rice University EMAIL |
| Pseudocode | Yes | Algorithm 1 GDPG-Twin for R-COPs with Independent Instances |
| Open Source Code | Yes | Source code at https://github.com/Xzr TGMu/twin-nphard. |
| Open Datasets | Yes | For MWIS problem in Section 4.1, the test set of 500 ER graphs and the corresponding optimal solutions are from https://github.com/zhongyuanzhao/distgcn (Zhao et al., 2022a). |
| Dataset Splits | No | During training, random weights are generated on-the-fly for a training graph, whereas the weights on a validation graph are unchanged. |
| Hardware Specification | Yes | The training takes about an hour on a server with 16 GB memory, 8 CPUs and 1 GPU (Nvidia 1080 Ti). |
| Software Dependencies | No | It should be noted that these runtimes are based on our Python code, of which the implementation could be further optimized for running speed. |
| Experiment Setup | Yes | For the four demonstrated R-COPs with independent instances, we set the hyperparameters of training procedure in Algorithm 1 as follows: αp = αc = 0.0001, E = 25, B = 100, ϵ = 0.15. |