Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Bayesian Inference of Temporal Task Specifications from Demonstrations

Authors: Ankit Shah, Pritish Kamath, Julie A. Shah, Shen Li

NeurIPS 2018 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated the performance of our model within two different domains: a synthetic domain in which we could easily vary the complexity of the ground truth specifications, and a domain representing the real-world task of setting a dinner table a task often incorporated into studies of learning from demonstration ([17]).
Researcher Affiliation Academia Ankit Shah CSAIL, MIT EMAIL Pritish Kamath CSAIL, MIT EMAIL Shen Li CSAIL, MIT EMAIL Julie Shah CSAIL, MIT EMAIL
Pseudocode Yes Algorithm 1 Sample Sets Of Linear Chains
Open Source Code No The paper states 'We implemented our probabilistic model in webppl [9]', but does not provide concrete access or an explicit statement about the availability of their own implementation's source code.
Open Datasets No The paper uses a synthetic domain and a self-collected dataset ('A total of 71 demonstrations were collected') for the dinner table task, but does not provide concrete access information (link, DOI, formal citation) for a publicly available or open dataset.
Dataset Splits No The paper mentions using 'randomly sampled subsets of different sizes' for training, but does not provide specific details on training/validation/test dataset splits (e.g., percentages, sample counts, or predefined splits) necessary for reproduction.
Hardware Specification Yes The inference was run on a desktop with an Intel i7-7700 processor.
Software Dependencies No The paper mentions implementing the model in 'webppl [9]', but does not provide specific version numbers for webppl or any other software dependencies, libraries, or solvers used in the experiments.
Experiment Setup Yes The hyperparameters, including those defined in Table 1 and ϵ, were set as follows: p E, p G = 0.8; ppart = 0.3; Nnew = 5; ϵ = 4 log(2) (|τ +|Ω|+0.5|Ω|(|Ω| 1)). These values were held constant for all evaluation scenarios. ... The posterior distribution of candidate formulas is constructed using webppl s Markov chain Monte Carlo (MCMC) sampling algorithm from 10,000 samples, with 100 samples used as burn-in.