Threshold Learning for Optimal Decision Making
Authors: Nathan F. Lepora
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We examine threshold learning by constructing a reward function that averages over many trials to Wald s cost function that defines decision optimality. ... We address this challenge by proposing two distinct ways to learn the decision thresholds... We find that both methods can successfully learn the thresholds, as validated by comparison against an exhaustive optimization of the reward function. ... Finally, we compare both methods with experimental two-alternative forced choice data, and find that REINFORCE gives a better account of the acquisition (learning) phase, such as converging over a similar number of trials. |
| Researcher Affiliation | Academia | Nathan F. Lepora Department of Engineering Mathematics, University of Bristol, UK n.lepora@bristol.ac.uk |
| Pseudocode | Yes | Algorithm Bayesian optimization applied to optimal decision making for n=1 to N do New thresholds from optimizing acquisition function (θ0, θ1)n = argmax (θ0,θ1) α(θ0, θ1; Dn 1) Make the decision with thresholds (θ0, θ1)n to find reward R(n) Augment data by including new samples Dn = (Dn 1; (θ0, θ1)n, R(n)) Update the statistical (Gaussian process) model of the rewards end for |
| Open Source Code | Yes | The code used to generate the results and figures for this paper is at http://lepora.com/publications.htm |
| Open Datasets | No | The paper uses a simulated drift-diffusion model ('we consider a drift-diffusion model equivalent to the SPRT with distribution means µ0 = µ1 =1/3 and standard deviation σ = 1.') rather than an external, publicly available dataset for training its models. While it compares its findings to animal learning data from other studies ([19], [20]), it does not use these as direct inputs for model training. |
| Dataset Splits | No | The paper performs simulations of learning algorithms but does not describe using a dataset with explicit train/validation/test splits for its own experimental setup. It refers to 'validation' as comparison against an exhaustive optimization or against animal learning data, not as a data split. |
| Hardware Specification | Yes | Benchmarked on an i7 2.7GHz CPU. |
| Software Dependencies | No | The paper mentions 'GPML toolbox and a quasi Newton optimizer in MATLAB' for fitting hyperparameters but does not provide specific version numbers for any of these software components. |
| Experiment Setup | Yes | Decision costs c = 0.05, W0 = 0.1 and W1 = 1. ... drift-diffusion model equivalent to the SPRT with distribution means µ0 = µ1 =1/3 and standard deviation σ = 1. ... learning rate β (here generally taken as 0.1). ... a value γ = 0.5 was found to be effective, and is used in all simulations using the REINFORCE rule in this paper. ... ns = 10 units per threshold with maximum threshold θmax = 10. |