Hindsight Learning for MDPs with Exogenous Inputs
Authors: Sean R. Sinclair, Felipe Vieira Frujeri, Ching-An Cheng, Luke Marshall, Hugo De Oliveira Barbalho, Jingling Li, Jennifer Neville, Ishai Menache, Adith Swaminathan
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare HL against classic baselines in the multi-secretary and airline revenue management problems. We also scale our algorithms to a business-critical cloud resource management problem allocating Virtual Machines (VMs) to physical machines, and simulate their performance with real datasets from a large public cloud provider. We find that HL algorithms outperform domain-specific heuristics, as well as state-of-the-art reinforcement learning methods. |
| Researcher Affiliation | Collaboration | Sean R. Sinclair 1 Felipe Frujeri 2 Ching-An Cheng 2 Luke Marshall 2 Hugo Barbalho 2 Jingling Li 3 Jennifer Neville 2 Ishai Menache 2 Adith Swaminathan 2 1School of Operations Research and Information Engineering, Cornell University 2Microsoft Research, Redmond 3Department of Computer Science, University of Maryland. |
| Pseudocode | Yes | Algorithm 1 Hindsight Learning, Algorithm 2 Training Procedure in MARO, Algorithm 3 Hindsight Heuristic. |
| Open Source Code | Yes | We evaluate Hindsight Learning on three resource management domains with different characteristics (our code is available at https://github.com/seanrsinclair/hindsightlearning). |
| Open Datasets | Yes | In this scenario the VM requests are uniformly sampled from the 2019 snapshot of the Azure Public Dataset (Cortez et al., 2017). |
| Dataset Splits | Yes | The first half of the resulting trace is used for training and the remaining trace for testing. Our total trace period was 88 days, and we used the exact methodology as in 7.3.1 to obtain the training and test datasets. |
| Hardware Specification | No | The paper states that experiments were conducted using the 'Microsoft Azure ML training platform' and 'same compute hardware' but does not provide specific details on CPU models, GPU models, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Py Torch (Paszke et al., 2019)' and 'MARO (Jiang et al., 2020)' but does not specify their exact version numbers, which are required for reproducible software dependencies. |
| Experiment Setup | Yes | All the ML methods use a 4-layer neural net to map features describing a PM and the VM request to a score. In Appendix G, we detail the network design, state features and the hyper-parameter ranges we used. We use a four layer neural network with (32, 16, 8) hidden dimensions, an output dimension of one (due to the action-dependent features), and Leaky ReLU activation functions. For each of the algorithms we use the RMSprop optimization algorithm. Table 6: List of hyperparameters tuned over for the Sim2Real RL and Hindsight Learning algorithms. |