Cellular Network Traffic Scheduling With Deep Reinforcement Learning

Authors: Sandeep Chinchali, Pan Hu, Tianshu Chu, Manu Sharma, Manu Bansal, Rakesh Misra, Marco Pavone, Sachin Katti

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using 4 weeks of real network data from downtown Melbourne, Australia spanning diverse traffic patterns, we demonstrate that our RL scheduler can enable mobile networks to carry 14.7% more data with minimal impact on existing traffic, and outperforms heuristic schedulers by more than 2 . Our work is a valuable step towards designing autonomous, selfdriving networks that learn to manage themselves from past data.
Researcher Affiliation Collaboration Sandeep Chinchali,1 Pan Hu,2 Tianshu Chu,3 Manu Sharma,3 Manu Bansal,3 Rakesh Misra,3 Marco Pavone,4 Sachin Katti 1,2 1 Department of Computer Science, Stanford University 2 Department of Electrical Engineering, Stanford University 3 Uhana, Inc. 4 Department of Aeronautics and Astronautics, Stanford University
Pseudocode No The paper does not include pseudocode or a clearly labeled algorithm block.
Open Source Code Yes We implement a variant of the DDPG RL algorithm with Google s Tensor Flow (Abadi et al. 2016) and build a novel network simulator using open AI s gym environment (Brockman et al. 2016). The simulator code is publicly available at https:// bitbucket.org/sandeep chinchali/aaai18 deeprlcell.
Open Datasets No Network traces were collected in cooperation with a major operator, spanning 4 weeks of data from 10 diverse cells in Melbourne, Australia. Anonymized user data is used to calculate user and cell level performance metrics. The paper uses proprietary data and does not provide access information for it.
Dataset Splits Yes The experiments are conducted on 27 Melbourne cell-day pairs (19 train, 8 test days). For each cell, we fit a separate RF throughput model unknown to the agent and train DDPG using a reward function where throughput limit L is set approximately as the median of B in training data. Each training episode is a simulation of a certain cell-day pair, and we observe stable convergence within 200 episodes in all cases.
Hardware Specification No The paper mentions that numerical experiments took "several hours on a modern multicore server" but does not provide specific hardware details such as CPU model, GPU model, or memory.
Software Dependencies No The paper mentions using Google's TensorFlow and OpenAI's Gym environment but does not specify version numbers for these software dependencies, which are crucial for reproducibility.
Experiment Setup Yes We use standard parameters for DDPG. The neural networks have two hidden layers of sizes 400 and 300. The actor and critic networks have learning rates 0.0001 and 0.001, and L2 norm regularization weights 0 and 0.001, respectively. The architecture of networks was tuned using validation days and control performance did not improve past two hidden layers. The discount factor is 0.99 and the minibatch size is 32.