Queue-Learning: A Reinforcement Learning Approach for Providing Quality of Service
Authors: Majid Raeis, Ali Tizghadam, Alberto Leon-Garcia461-468
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The evaluations are presented for a tandem queueing system with non-exponential inter-arrival and service times, the results of which validate our controller s capability in meeting Qo S constraints. |
| Researcher Affiliation | Academia | Majid Raeis, Ali Tizghadam, Alberto Leon-Garcia University of Toronto, Canada {m.raeis, ali.tizghadam, alberto.leongarcia}@utoronto.ca |
| Pseudocode | No | The paper describes the algorithms and their components (e.g., DDPG, actor-critic) but does not provide pseudocode or a formal algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement or link for open-source code for the methodology described. |
| Open Datasets | No | The paper states: "In order to perform our experiments, we set-up our own queueing environment in Python." This indicates a custom environment/data generator rather than a publicly available or open dataset with access information provided. |
| Dataset Splits | No | The paper does not specify exact training, validation, and test dataset splits or methods for creating them beyond stating it created its own queueing environment. |
| Hardware Specification | Yes | Our experiments were conducted on a server with one Intel Xeon E5-2640v4 CPU, 128GB memory and a Tesla P100 12GB GPU. |
| Software Dependencies | No | The algorithm and environment are both implemented in Python, where we have used Py Torch for DDPG implementation. (No version numbers are provided for Python or PyTorch, making it not reproducible for software dependencies). |
| Experiment Setup | Yes | The actor and critic networks consist of two hidden layers, each having 64 neurons. We use RELU activation function for the hidden layers, Tanh activation for the actor s output layer and linear activation function for the critic s output layer. The learning rates in the actor and critic networks are 10 4 and 10 3, respectively. We choose a batch size of 128 and set γ and τ to 0.99 and 10 2, respectively. For the exploration noise, we add Ornstein Uhlenbeck process to our actor policy (Lillicrap et al. 2015), with its parameters set to µ = 0, θ = 0.15 and σ decaying from 0.5 to 0.005. |