reproducibilityindex.ai

Probability Paths and the Structure of Predictions over Time

Authors: Zhiyuan Jerry Lin, Hao Sheng, Sharad Goel

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We now explore the efﬁcacy of GLIM through a series of experiments on two real-world datasets. We have additionally included a simulation study on a synthetic dataset in Appendix B to demonstrate GILM s empirical ﬁnite sample behavior. In all of our experiments, we use a covariance matrix Σ(X, θ) with autoregressive structure and heteroskedastic variance. We compare GLIM against three representative baselines from each of the three aforementioned classes of models in Section 2: (1) MMFE: martingale method of forecast evolution [Heath and Jackson, 1994, Zhao et al., 2013]; (2) LR: a set of linear regression models {mt} that predict the future estimated probability at time t [Brockwell and Davis, 2016]; and (3) MQLSTM: a Bayesian multi-horizon quantile LSTM model [Wen et al., 2017, Eisenach et al., 2020]. As displayed in the plot, GLIM outperforms all three baselines across the board. Particularly, Figure 4a are Figure 5a are plotted in log scale, suggesting GLIM is outperforming baselines by a few orders of magnitudes on those metrics.
Researcher Affiliation	Collaboration	Zhiyuan Jerry Lin Facebook zylin@fb.com Hao Sheng Stanford University haosheng@cs.stanford.edu Sharad Goel Harvard University sgoel@hks.harvard.edu
Pseudocode	No	The paper describes multi-step procedures for model inference and drawing probability paths but does not present them in structured pseudocode or an algorithm block explicitly labeled as such.
Open Source Code	Yes	Code to replicate our experiments is available online at: https://github.com/Its Mr Lin/probability-paths.
Open Datasets	Yes	Speciﬁcally, we use a dataset of Australian rainfall observations [Williams, 2011, Young and Young, 2018], and construct daily predictions starting seven days in advance of the target date. Kaggle: Rain in australia, 2018. https://www.kaggle.com/jsphyg/weather-dataset-rattle-package.
Dataset Splits	No	For the basketball dataset, the paper states: 'training our model on the ﬁrst season, and evaluating our predictions on the second' (2017-2018 for training, 2018-2019 for evaluation). For the weather dataset, it states: 'We randomly sampled 10,000 target dates in the dataset prior to 2014 for training our models, and randomly sampled 10,000 target dates in or after 2014 for testing.' There is no explicit mention of a separate validation split.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU, CPU models, or cloud computing specifications) used to run the experiments.
Software Dependencies	No	The paper mentions that Hamiltonian Monte Carlo (HMC) is 'as implemented in Stan [Carpenter et al., 2017]', but it does not specify the version number of Stan or any other software dependencies with their versions.
Experiment Setup	Yes	In all of our experiments, we use a covariance matrix Σ(X, θ) with autoregressive structure and heteroskedastic variance. Speciﬁcally, we set Σ(i,j) = σiσjρ(n \|i j\|), where the variance at time t is σ2 t = Gβ(X, t). We use a regularized linear function Gβ(X, t) for GLIM, which we describe in detail in Appendix C.1. We set ρ = 0 in this case and use a regularized quadratic function Gβ(X, t) for GLIM, which we describe in detail in Appendix C.2. For each model, all metrics are calculated using 100 simulated samples per probability path. Without further constraints, θ is not fully identiﬁed by the data, since multiplying all of the latent variables in Eq. (1) by a positive constant does not affect the sign of the relevant expression. Thus, in our applications below, we constrain the scale of the latent variables by requiring Var(Z1) = σ2 1 = 1.