reproducibilityindex.ai

Threshold Learning for Optimal Decision Making

Authors: Nathan F. Lepora

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We examine threshold learning by constructing a reward function that averages over many trials to Wald s cost function that deﬁnes decision optimality. ... We address this challenge by proposing two distinct ways to learn the decision thresholds... We ﬁnd that both methods can successfully learn the thresholds, as validated by comparison against an exhaustive optimization of the reward function. ... Finally, we compare both methods with experimental two-alternative forced choice data, and ﬁnd that REINFORCE gives a better account of the acquisition (learning) phase, such as converging over a similar number of trials.
Researcher Affiliation	Academia	Nathan F. Lepora Department of Engineering Mathematics, University of Bristol, UK n.lepora@bristol.ac.uk
Pseudocode	Yes	Algorithm Bayesian optimization applied to optimal decision making for n=1 to N do New thresholds from optimizing acquisition function (θ0, θ1)n = argmax (θ0,θ1) α(θ0, θ1; Dn 1) Make the decision with thresholds (θ0, θ1)n to ﬁnd reward R(n) Augment data by including new samples Dn = (Dn 1; (θ0, θ1)n, R(n)) Update the statistical (Gaussian process) model of the rewards end for
Open Source Code	Yes	The code used to generate the results and ﬁgures for this paper is at http://lepora.com/publications.htm
Open Datasets	No	The paper uses a simulated drift-diffusion model ('we consider a drift-diffusion model equivalent to the SPRT with distribution means µ0 = µ1 =1/3 and standard deviation σ = 1.') rather than an external, publicly available dataset for training its models. While it compares its findings to animal learning data from other studies ([19], [20]), it does not use these as direct inputs for model training.
Dataset Splits	No	The paper performs simulations of learning algorithms but does not describe using a dataset with explicit train/validation/test splits for its own experimental setup. It refers to 'validation' as comparison against an exhaustive optimization or against animal learning data, not as a data split.
Hardware Specification	Yes	Benchmarked on an i7 2.7GHz CPU.
Software Dependencies	No	The paper mentions 'GPML toolbox and a quasi Newton optimizer in MATLAB' for fitting hyperparameters but does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	Decision costs c = 0.05, W0 = 0.1 and W1 = 1. ... drift-diffusion model equivalent to the SPRT with distribution means µ0 = µ1 =1/3 and standard deviation σ = 1. ... learning rate β (here generally taken as 0.1). ... a value γ = 0.5 was found to be effective, and is used in all simulations using the REINFORCE rule in this paper. ... ns = 10 units per threshold with maximum threshold θmax = 10.