Threshold Learning for Optimal Decision Making

Authors: Nathan F. Lepora

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We examine threshold learning by constructing a reward function that averages over many trials to Wald s cost function that defines decision optimality. ... We address this challenge by proposing two distinct ways to learn the decision thresholds... We find that both methods can successfully learn the thresholds, as validated by comparison against an exhaustive optimization of the reward function. ... Finally, we compare both methods with experimental two-alternative forced choice data, and find that REINFORCE gives a better account of the acquisition (learning) phase, such as converging over a similar number of trials.
Researcher Affiliation Academia Nathan F. Lepora Department of Engineering Mathematics, University of Bristol, UK n.lepora@bristol.ac.uk
Pseudocode Yes Algorithm Bayesian optimization applied to optimal decision making for n=1 to N do New thresholds from optimizing acquisition function (θ0, θ1)n = argmax (θ0,θ1) α(θ0, θ1; Dn 1) Make the decision with thresholds (θ0, θ1)n to find reward R(n) Augment data by including new samples Dn = (Dn 1; (θ0, θ1)n, R(n)) Update the statistical (Gaussian process) model of the rewards end for
Open Source Code Yes The code used to generate the results and figures for this paper is at http://lepora.com/publications.htm
Open Datasets No The paper uses a simulated drift-diffusion model ('we consider a drift-diffusion model equivalent to the SPRT with distribution means µ0 = µ1 =1/3 and standard deviation σ = 1.') rather than an external, publicly available dataset for training its models. While it compares its findings to animal learning data from other studies ([19], [20]), it does not use these as direct inputs for model training.
Dataset Splits No The paper performs simulations of learning algorithms but does not describe using a dataset with explicit train/validation/test splits for its own experimental setup. It refers to 'validation' as comparison against an exhaustive optimization or against animal learning data, not as a data split.
Hardware Specification Yes Benchmarked on an i7 2.7GHz CPU.
Software Dependencies No The paper mentions 'GPML toolbox and a quasi Newton optimizer in MATLAB' for fitting hyperparameters but does not provide specific version numbers for any of these software components.
Experiment Setup Yes Decision costs c = 0.05, W0 = 0.1 and W1 = 1. ... drift-diffusion model equivalent to the SPRT with distribution means µ0 = µ1 =1/3 and standard deviation σ = 1. ... learning rate β (here generally taken as 0.1). ... a value γ = 0.5 was found to be effective, and is used in all simulations using the REINFORCE rule in this paper. ... ns = 10 units per threshold with maximum threshold θmax = 10.