reproducibilityindex.ai

Hindsight Learning for MDPs with Exogenous Inputs

Authors: Sean R. Sinclair, Felipe Vieira Frujeri, Ching-An Cheng, Luke Marshall, Hugo De Oliveira Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods.
Researcher Affiliation	Collaboration	Sean R. Sinclair 1 Felipe Frujeri 2 Ching-An Cheng 2 Luke Marshall 2 Hugo Barbalho 2 Jingling Li 3 Jennifer Neville 2 Ishai Menache 2 Adith Swaminathan 2 1School of Operations Research and Information Engineering, Cornell University 2Microsoft Research, Redmond 3Department of Computer Science, University of Maryland.
Pseudocode	Yes	Algorithm 1 Hindsight Learning, Algorithm 2 Training Procedure in MARO, Algorithm 3 Hindsight Heuristic.
Open Source Code	Yes	We evaluate Hindsight Learning on three resource management domains with different characteristics (our code is available at https://github.com/seanrsinclair/hindsightlearning).
Open Datasets	Yes	In this scenario the VM requests are uniformly sampled from the 2019 snapshot of the Azure Public Dataset (Cortez et al., 2017).
Dataset Splits	Yes	The first half of the resulting trace is used for training and the remaining trace for testing. Our total trace period was 88 days, and we used the exact methodology as in 7.3.1 to obtain the training and test datasets.
Hardware Specification	No	The paper states that experiments were conducted using the 'Microsoft Azure ML training platform' and 'same compute hardware' but does not provide specific details on CPU models, GPU models, or memory specifications.
Software Dependencies	No	The paper mentions using 'Py Torch (Paszke et al., 2019)' and 'MARO (Jiang et al., 2020)' but does not specify their exact version numbers, which are required for reproducible software dependencies.
Experiment Setup	Yes	All the ML methods use a 4-layer neural net to map features describing a PM and the VM request to a score. In Appendix G, we detail the network design, state features and the hyper-parameter ranges we used. We use a four layer neural network with (32, 16, 8) hidden dimensions, an output dimension of one (due to the action-dependent features), and Leaky ReLU activation functions. For each of the algorithms we use the RMSprop optimization algorithm. Table 6: List of hyperparameters tuned over for the Sim2Real RL and Hindsight Learning algorithms.