reproducibilityindex.ai

Graph Reinforcement Learning for Network Control via Bi-Level Optimization

Authors: Daniele Gammelli, James Harrison, Kaidi Yang, Marco Pavone, Filipe Rodrigues, Francisco C. Pereira

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further highlight a collection of desirable features to system designers, investigate design decisions, and present experiments on real-world control problems showing the utility, scalability, and flexibility of our framework. We show that our approach is highly performant, scalable, and robust to changes in operating conditions and network topologies, both on artificial test problems, as well as real-world problems, such as supply chain inventory control and dynamic vehicle routing.
Researcher Affiliation	Collaboration	1Stanford University 2Google Research, Brain Team 3National University of Singapore 4Technical University of Denmark.
Pseudocode	No	The paper describes methods with equations and text but does not include a clearly labeled pseudocode block or algorithm.
Open Source Code	Yes	1Code available at: https://github.com/Daniele Ga mmelli/graph-rl-for-network-optimization
Open Datasets	Yes	The case studies in our experiments are generated using trip record datasets, which we provide together with our codebase.
Dataset Splits	No	The paper describes dynamic environments and how data is generated within these environments (e.g., stochastic demand), but it does not specify explicit train/validation/test dataset splits for static datasets.
Hardware Specification	Yes	All methods used the same computational CPU resources, namely a AMD Ryzen Threadripper 2950X (16-Core, 32 Thread, 40M Cache, 3.4 GHz base).
Software Dependencies	No	All RL modules were implemented using Py Torch (Paszke et al., 2019) and the IBM CPLEX solver (IBM, 1987) for the optimization problem. (The years refer to publication dates of the references, not software versions.)
Experiment Setup	Yes	In our experiments, the resulting policy proved to be broadly insensitive to values of λ, with λ ∈ [15, 30] typically being an effective range. In all our experiments, we use two layers of 32 hidden unites and an output layer mapping to the output’s support (e.g., a scalar value for the critic network).