Scheduling of Time-Varying Workloads Using Reinforcement Learning

Authors: Shanka Subhra Mondal, Nikhil Sheoran, Subrata Mitra9000-9008

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Validations with real production traces from Google and Alibaba show that our technique can significantly improve metrics for operational excellence (e.g. utilization, fragmentation, resource exhaustion etc.) for a cluster compared to the baselines.
Researcher Affiliation Collaboration Shanka Subhra Mondal1*, Nikhil Sheoran2*, Subrata Mitra2 1Princeton University 2Adobe Research
Pseudocode Yes Algorithm 1 describes the high-level online learning and placement logic.
Open Source Code No The paper does not provide any explicit statement about releasing its source code, nor does it include a link to a code repository for the described methodology.
Open Datasets Yes Google traces (Wilkes 2011) contain production workload scheduling requests for a period of 29 days. Alibaba traces (Alibaba 2018) contain production traces from 4k machines over 8 days. Both contains CPU/memory numbers used by each workload at a granularity of 5 minutes, along with scheduling details, e.g., priority, class and original resource request.
Dataset Splits No The paper states 'For each trace, the training and test set consists of 100 and 30 such distinct job sequences, respectively.' It specifies training and test sets but does not mention a separate validation set or provide explicit splits for one.
Hardware Specification Yes Training and testing used a batch size of 20 examples run in parallel on a 32 core Intel Xeon CPU E5-2686 v4.
Software Dependencies No The paper mentions software like 'Theano', 'Adam optimizer', 'tsfresh', and 'K-Means clustering algorithm (Pedregosa et al. 2011) with k-means++ initialization', but it does not provide specific version numbers for these software components.
Experiment Setup Yes The parameters for the statespace are M = 10, h = 20, d = 2, C1 = C2 = 8. The weights of the penalty parameters are chosen as Kc = 0.1, Ku = 3, Ko = 30000, Kw = 50. ... Adam optimizer and a learning rate (η) of 0.001. We train using REINFORCE algorithm (Sutton et al. 2000) with the number of trajectories (N) set to 20 and in an episodic manner (Mnih et al. 2013) for a total of 2000 iterations, with maximum episode length (L) 200.