Graph Reinforcement Learning for Network Control via Bi-Level Optimization
Authors: Daniele Gammelli, James Harrison, Kaidi Yang, Marco Pavone, Filipe Rodrigues, Francisco C. Pereira
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework. We show that our approach is highly performant, scalable, and robust to changes in operating conditions and network topologies, both on artificial test problems, as well as real-world problems, such as supply chain inventory control and dynamic vehicle routing. |
| Researcher Affiliation | Collaboration | 1Stanford University 2Google Research, Brain Team 3National University of Singapore 4Technical University of Denmark. |
| Pseudocode | No | The paper describes methods with equations and text but does not include a clearly labeled pseudocode block or algorithm. |
| Open Source Code | Yes | 1Code available at: https://github.com/Daniele Ga mmelli/graph-rl-for-network-optimization |
| Open Datasets | Yes | The case studies in our experiments are generated using trip record datasets, which we provide together with our codebase. |
| Dataset Splits | No | The paper describes dynamic environments and how data is generated within these environments (e.g., stochastic demand), but it does not specify explicit train/validation/test dataset splits for static datasets. |
| Hardware Specification | Yes | All methods used the same computational CPU resources, namely a AMD Ryzen Threadripper 2950X (16-Core, 32 Thread, 40M Cache, 3.4 GHz base). |
| Software Dependencies | No | All RL modules were implemented using Py Torch (Paszke et al., 2019) and the IBM CPLEX solver (IBM, 1987) for the optimization problem. (The years refer to publication dates of the references, not software versions.) |
| Experiment Setup | Yes | In our experiments, the resulting policy proved to be broadly insensitive to values of λ, with λ ∈ [15, 30] typically being an effective range. In all our experiments, we use two layers of 32 hidden unites and an output layer mapping to the output’s support (e.g., a scalar value for the critic network). |