Infinity Learning: Learning Markov Chains from Aggregate Steady-State Observations
Authors: Jianfei Gao, Mohamed A. Zahran, Amit Sheoran, Sonia Fahmy, Bruno Ribeiro3922-3929
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply -SGD to a real-world testbed and synthetic experiments showcasing its accuracy, ability to extrapolate the steady state distribution to unobserved states under unobserved conditions (heavy loads, when training under light loads), and succeeding in difficult scenarios where even a tailor-made extension of existing methods fails. |
| Researcher Affiliation | Academia | Jianfei Gao,1 Mohamed A. Zahran,2 Amit Sheoran,3 Sonia Fahmy,4 Bruno Ribeiro5 Department of Computer Science, Purdue University 305 N. University St, West Lafayette, IN 47907 {gao4621, mzahran2, asheoran3}@purdue.edu, {fahmy4, ribeiro5}@cs.purdue.edu |
| Pseudocode | No | The paper describes mathematical derivations and theoretical concepts, but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the code for the described methodology. |
| Open Datasets | No | The training data consists of custom-generated data: 'The training data (86 time windows) is generated under light loads...' for the testbed, and 'Structures used to simulate data are provided in Supplementary Material A3' for synthetic experiments. No concrete access information (link, DOI, or formal citation) to these datasets is provided for public availability. |
| Dataset Splits | No | The paper mentions 'training data' and 'test data' but does not explicitly provide details about a separate validation split or the percentages/counts for the data partitioning. |
| Hardware Specification | No | The paper describes the setup of the testbed system being emulated ('single server with a waiting queue of size K = 20'), but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments or simulations. |
| Software Dependencies | No | The paper mentions 'Pytorch s autodiff function' but does not specify a version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | In most of our simulations, we set t = 128 = 27 and p(events)(0) = 1T/|S| throughout all our experiments.We also tested BPTT without divide and conquer but find the optimization unstable due to the long backpropagation paths. We tested t {16, 128} and found that smaller values of t are easier to optimize but as expected generally produce worse approximations of the steady state for heavy loads. In most of our experiments, p = 0.1, that is, on average we consider only the first ten terms in the sum of Equation (10). Contrast, -SGD s 10 summation terms with matrix powers that need no chain rule, with the baseline DCBPTT approach (Definition 2) where P (Q(x, θ))128 needs to be computed together with a chain rule to compute gradients over the matrix multiplications. It is no surprise that SGD is a more stable optimization method (no vanishing or exploding gradients); interestingly, -SGD also works well on the tested slow-mixing CTMCs, while baseline methods like DC-BPTT fail in these scenarios (see Supplementary Material C5). |