reproducibilityindex.ai

Interpretable Models for Understanding Immersive Simulations

Authors: Nicholas Hoernle, Kobi Gal, Barbara Grosz, Leilah Lyons, Ada Ren, Andee Rubin

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper describes methods for comparative evaluation of the interpretability of models of high dimensional time series data inferred by unsupervised machine learning algorithms. We designed two interpretability tests, each of which evaluates the extent to which a model s output aligns with people s expectations or intuitions of what has occurred in the simulation. We compared the performance of the models on these interpretability tests to their performance on statistical information criteria. We show that the models that optimize interpretability quality differ from those that optimize (statistical) information theoretic criteria. Furthermore, we found that a model using a fully Bayesian approach performed well on both the statistical and human-interpretability measures.
Researcher Affiliation	Collaboration	Nicholas Hoernle1 , Kobi Gal1,2 , Barbara Grosz3 , Leilah Lyons4 , Ada Ren5 and Andee Rubin5 1University of Edinburgh 2Ben-Gurion University 3Harvard University 4NYSCI 5TERC n.s.hoernle@sms.ed.ac.uk, {gal,grosz}@eecs.harvard.edu, llyons@nysci.org, {ada ren, andee rubin}@terc.com
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access information (e.g., links to repositories or explicit statements of code release) for the described methodology.
Open Datasets	No	The time series data used in this investigation were logs from an immersive simulation like those commonly used in education and healthcare training. The time series was the only source of information about students interactions, and it was not possible to access the CW simulation except at NYSCI. No public access information (link, DOI, formal citation) for the dataset is provided.
Dataset Splits	No	The paper does not provide specific details about training, validation, and test dataset splits. It mentions random sampling of time points for interpretability tests but not formal data splits for model training/evaluation in the typical ML sense.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments. It mentions Mturk workers for human evaluation but not the computational hardware for model training or inference.
Software Dependencies	No	The paper mentions using a 'Gibbs sampler' and 'mixture of two multivariate Gaussians with conjugate Normal-inverse Wishart priors' but does not specify any software libraries or frameworks with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x, etc.)
Experiment Setup	Yes	The Sticky-HMM approach introduced by Fox et al. [2008] includes a hyperparameter, κ, that biases the model to persist in a state... The increase in the length of periods corresponds to a decrease in the number of latent states. The opposite is true for lower values of κ... 1. MKX: sticky HMM with ﬁxed κ. We use the basic structure... with set values for κ to produce 10 unique models, spanning a wide range of possible settings2. κ {1, 5, 10, 50, 100, 150, 200, 300, 500, 700}. 2. FB: fully Bayesian sticky HMM with Gamma prior on κ. This approach places a weakly informative, conjugate Gamma prior on the hyperparameter that expresses high uncertainty over the κ values3. For models in class 1 and 2, we use the Gibbs sampler, described by Fox et al. [2008], to perform inference over the parameters in the model, this includes inference over the state sequence and thus the period segmentation of the model. The observation distribution was chosen to be a mixture of two multivariate Gaussians with conjugate Normal-inverse Wishart priors.