Scheduling of Time-Varying Workloads Using Reinforcement Learning
Authors: Shanka Subhra Mondal, Nikhil Sheoran, Subrata Mitra9000-9008
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Validations with real production traces from Google and Alibaba show that our technique can significantly improve metrics for operational excellence (e.g. utilization, fragmentation, resource exhaustion etc.) for a cluster compared to the baselines. |
| Researcher Affiliation | Collaboration | Shanka Subhra Mondal1*, Nikhil Sheoran2*, Subrata Mitra2 1Princeton University 2Adobe Research |
| Pseudocode | Yes | Algorithm 1 describes the high-level online learning and placement logic. |
| Open Source Code | No | The paper does not provide any explicit statement about releasing its source code, nor does it include a link to a code repository for the described methodology. |
| Open Datasets | Yes | Google traces (Wilkes 2011) contain production workload scheduling requests for a period of 29 days. Alibaba traces (Alibaba 2018) contain production traces from 4k machines over 8 days. Both contains CPU/memory numbers used by each workload at a granularity of 5 minutes, along with scheduling details, e.g., priority, class and original resource request. |
| Dataset Splits | No | The paper states 'For each trace, the training and test set consists of 100 and 30 such distinct job sequences, respectively.' It specifies training and test sets but does not mention a separate validation set or provide explicit splits for one. |
| Hardware Specification | Yes | Training and testing used a batch size of 20 examples run in parallel on a 32 core Intel Xeon CPU E5-2686 v4. |
| Software Dependencies | No | The paper mentions software like 'Theano', 'Adam optimizer', 'tsfresh', and 'K-Means clustering algorithm (Pedregosa et al. 2011) with k-means++ initialization', but it does not provide specific version numbers for these software components. |
| Experiment Setup | Yes | The parameters for the statespace are M = 10, h = 20, d = 2, C1 = C2 = 8. The weights of the penalty parameters are chosen as Kc = 0.1, Ku = 3, Ko = 30000, Kw = 50. ... Adam optimizer and a learning rate (η) of 0.001. We train using REINFORCE algorithm (Sutton et al. 2000) with the number of trajectories (N) set to 20 and in an episodic manner (Mnih et al. 2013) for a total of 2000 iterations, with maximum episode length (L) 200. |