Optimal decision-making with time-varying evidence reliability

Authors: Jan Drugowitsch, Ruben Moreno-Bote, Alexandre Pouget

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Furthermore, a comparison to simpler decision-making heuristics demonstrates when such heuristics fail to feature comparable performance. In particular, we derive Bayes-optimal evidence accumulation for our task setup, and compute the optimal policy for such cases by dynamic programming. To do so, we borrow concepts from continuous-time stochastic control to keep the computational complexity linear in the process space size (rather than quadratic for the na ıve approach). Finally, we characterize how the optimal policy depends on parameters that determine the evidence reliability time-course, and show that simpler, heuristic policies fail to match the optimal performance for particular sub-regions of this parameter space. ... This optimization was performed by the Subplex algorithm [16] in the NLopt tookit [17], where the ER / RR was found by Monte Carlo simulations.
Researcher Affiliation Academia 1D ept. des Neurosciences Fondamentales Universit e de Gen eve CH-1211 Gen eve 4, Switzerland jdrugo@gmail.com, alexandre.pouget@unige.ch 2Research Unit, Parc Sanitari Sant Joan de D eu and University of Barcelona 08950 Barcelona, Spain rmoreno@fsjd.org
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper. The methodology is described using mathematical equations and descriptions of computational techniques like Dynamic Programming and PDE solving.
Open Source Code No No explicit statement or link providing concrete access to source code for the methodology described in this paper was found.
Open Datasets No The paper describes a theoretical model and computational simulations. It does not use or refer to any publicly available or open datasets for its experiments.
Dataset Splits No The paper describes a computational study based on dynamic programming and Monte Carlo simulations, rather than using traditional datasets with training, validation, and test splits. Therefore, no specific dataset split information is provided.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running experiments were mentioned in the paper.
Software Dependencies No The paper mentions 'Subplex algorithm [16] in the NLopt tookit [17]' but does not provide specific version numbers for NLopt or other ancillary software dependencies.
Experiment Setup Yes In all cases, we computed the optimal bounds by dynamic programming on a 200 × 200 grid on (g, τ), using δt = 0.005. g spun its whole [0, 1] range, and τ ranged from 0 to twice the 99th percentile of its steady-state distribution. We used maxg,τ |V n(g, τ) V n 1(g, τ)| 10−3δt as convergence criterion for value iteration.