Graph Reinforcement Learning for Network Control via Bi-Level Optimization

Authors: Daniele Gammelli, James Harrison, Kaidi Yang, Marco Pavone, Filipe Rodrigues, Francisco C. Pereira

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework. We show that our approach is highly performant, scalable, and robust to changes in operating conditions and network topologies, both on artificial test problems, as well as real-world problems, such as supply chain inventory control and dynamic vehicle routing.
Researcher Affiliation Collaboration 1Stanford University 2Google Research, Brain Team 3National University of Singapore 4Technical University of Denmark.
Pseudocode No The paper describes methods with equations and text but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code Yes 1Code available at: https://github.com/Daniele Ga mmelli/graph-rl-for-network-optimization
Open Datasets Yes The case studies in our experiments are generated using trip record datasets, which we provide together with our codebase.
Dataset Splits No The paper describes dynamic environments and how data is generated within these environments (e.g., stochastic demand), but it does not specify explicit train/validation/test dataset splits for static datasets.
Hardware Specification Yes All methods used the same computational CPU resources, namely a AMD Ryzen Threadripper 2950X (16-Core, 32 Thread, 40M Cache, 3.4 GHz base).
Software Dependencies No All RL modules were implemented using Py Torch (Paszke et al., 2019) and the IBM CPLEX solver (IBM, 1987) for the optimization problem. (The years refer to publication dates of the references, not software versions.)
Experiment Setup Yes In our experiments, the resulting policy proved to be broadly insensitive to values of λ, with λ ∈ [15, 30] typically being an effective range. In all our experiments, we use two layers of 32 hidden unites and an output layer mapping to the output’s support (e.g., a scalar value for the critic network).