Queue-Learning: A Reinforcement Learning Approach for Providing Quality of Service

Authors: Majid Raeis, Ali Tizghadam, Alberto Leon-Garcia461-468

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The evaluations are presented for a tandem queueing system with non-exponential inter-arrival and service times, the results of which validate our controller s capability in meeting Qo S constraints.
Researcher Affiliation Academia Majid Raeis, Ali Tizghadam, Alberto Leon-Garcia University of Toronto, Canada {m.raeis, ali.tizghadam, alberto.leongarcia}@utoronto.ca
Pseudocode No The paper describes the algorithms and their components (e.g., DDPG, actor-critic) but does not provide pseudocode or a formal algorithm block.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the methodology described.
Open Datasets No The paper states: "In order to perform our experiments, we set-up our own queueing environment in Python." This indicates a custom environment/data generator rather than a publicly available or open dataset with access information provided.
Dataset Splits No The paper does not specify exact training, validation, and test dataset splits or methods for creating them beyond stating it created its own queueing environment.
Hardware Specification Yes Our experiments were conducted on a server with one Intel Xeon E5-2640v4 CPU, 128GB memory and a Tesla P100 12GB GPU.
Software Dependencies No The algorithm and environment are both implemented in Python, where we have used Py Torch for DDPG implementation. (No version numbers are provided for Python or PyTorch, making it not reproducible for software dependencies).
Experiment Setup Yes The actor and critic networks consist of two hidden layers, each having 64 neurons. We use RELU activation function for the hidden layers, Tanh activation for the actor s output layer and linear activation function for the critic s output layer. The learning rates in the actor and critic networks are 10 4 and 10 3, respectively. We choose a batch size of 128 and set γ and τ to 0.99 and 10 2, respectively. For the exploration noise, we add Ornstein Uhlenbeck process to our actor policy (Lillicrap et al. 2015), with its parameters set to µ = 0, θ = 0.15 and σ decaying from 0.5 to 0.005.