Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning Procedural Abstractions and Evaluating Discrete Latent Temporal Structure
Authors: Karan Goel, Emma Brunskill
ICLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EXPERIMENTS WITH EVALUATION CRITERIA, 6 EXPERIMENTS WITH PRISM, Table 1: Comparison with baselines on BEES and the JIGSAWS surgical dataset., Table 2: Comparison with Sener & Yao (2018) on the INRIA dataset with the Munkres score. |
| Researcher Affiliation | Academia | Karan Goel Department of Computer Science Stanford University EMAIL, Emma Brunskill Department of Computer Science Stanford University EMAIL |
| Pseudocode | Yes | Algorithm 1 Calculation of RSS |
| Open Source Code | Yes | The results presented for the evaluation criteria developed in this paper can be reproduced using code available at https://github.com/Stanford AI4HI/ICLR2019_evaluating_ discrete_temporal_structure. and The results presented for PRISM can be reproduced using code available at https://github.com/Stanford AI4HI/ICLR2019_prism, which contains a Python implementation of PRISM. |
| Open Datasets | Yes | We use 2 common benchmark datasets (Fox et al., 2008b; 2009; 2014; Zhou et al., 2008; 2013). BEES consists of 6 time-series..., JIGSAWS dataset (Gao et al., 2014)..., Breakfast actions (Kuehne et al., 2014)..., INRIA instructional videos (Alayrac et al., 2016)... |
| Dataset Splits | No | No specific numerical train/validation/test splits (percentages or counts) or detailed splitting methodologies were explicitly stated in the paper. While datasets are mentioned, their partitioning for training, validation, and testing is not specified in a reproducible manner. |
| Hardware Specification | No | No specific details regarding the hardware specifications (e.g., CPU, GPU models, memory, or cloud instance types) used for running experiments were provided. |
| Software Dependencies | No | The paper mentions 'Python implementation of PRISM' but does not specify version numbers for Python or any other software dependencies, libraries, or frameworks used. |
| Experiment Setup | Yes | We use the hyperparameter settings given in Table 10 for PRISM. The Bayesian HMM has a single hyperparameter α which represents the hyperparameter for the Dirichlet prior over the transition matrix. and Table 10: Hyperparameter settings used for PRISM experiments. |