reproducibilityindex.ai

Bayesian Active Model Selection with an Application to Automated Audiometry

Authors: Jacob Gardner, Gustavo Malkomes, Roman Garnett, Kilian Q. Weinberger, Dennis Barbour, John P. Cunningham

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To test BAMS on our NIHL detection task, we evaluate our algorithm using audiometric data, comparing to several baselines. From the results of a small-scale clinical trial, we have examples of high-ﬁdelity audiometric functions inferred for several human patients using the method of Gardner et al. [8]. We may use these to simulate audiometric examinations of healthy patients using different methods to select tone presentations. We simulate patients with NIHL by adjusting ground truth inferred from nine healthy patients with in-model samples from our notch mean prior. Recall that high-resolution audiogram data is extremely scarce. We first took a thorough pure-tone audiometric test of each of nine patients from our trial with normal hearing using 100 samples selected using the algorithm in [8] on the domain X = [250, 8000] Hz [−10, 80] dB HL,3 typical ranges for audiometric testing [6]. We inferred the audiometric function over the entire domain from the measured responses, using the healthy-patient GP model Mhealthy with parameters learned via MLE–inference. The observation model was p(y = 1 \| f) = Φ(f), where Φ is the standard normal CDF, and approximate GP inference was performed via a Laplace approximation. We then used the approximate GP posterior p(f \| D, ˆθ, Mhealthy) for this patient as ground-truth for simulating a healthy patient’s responses. The posterior probability of tone detection learned from one patient is shown in the background of Figure 1(a). We simulated a healthy patient’s response to a given query tone x = [i , φ ] by sampling a conditionally independent Bernoulli random variable with parameter p(y = 1 \| x , D, ˆθ, Mhealthy). We simulated a patient with NIHL by then drawing notch parameters (the parameters of (14)) from an expert-informed prior, adding the corresponding notch to the learned healthy ground-truth latent mean, recomputing the detection probabilities, and proceeding as above.
Researcher Affiliation	Academia	Jacob R. Gardner CS, Cornell University Ithaca, NY 14850 jrg365@cornell.edu Gustavo Malkomes CSE, WUSTL St. Louis, MO 63130 luizgustavo@wustl.edu Roman Garnett CSE, WUSTL St. Louis, MO 63130 garnett@wustl.edu Kilian Q. Weinberger CS, Cornell University Ithaca, NY 14850 kqw4@cornell.edu Dennis Barbour BME, WUSTL St. Louis, MO 63130 dbarbour@wustl.edu John P. Cunningham Statistics, Columbia University New York, NY 10027 jpc2181@columbia.edu
Pseudocode	No	The paper describes the proposed method in detail, including its steps and calculations, but it does not present any structured pseudocode or algorithm blocks.
Open Source Code	No	See the Appendix for explicit formulas for common likelihoods and a description of general-purpose, reusable code we will release in conjunction with this manuscript to ease implementation.
Open Datasets	No	The paper uses 'patient data from a clinical trial' and simulates based on audiometric functions inferred from nine healthy patients. However, it does not provide concrete access information (e.g., specific link, DOI, repository name, or formal citation for public access) for the raw patient data or the simulated dataset used for experiments.
Dataset Splits	No	The paper describes simulating audiometric tests and initializing with 'ﬁve random tones' followed by actively selecting up to '25 additional tones'. This describes the active learning process, but it does not specify traditional training, validation, or test dataset splits (e.g., percentages or exact sample counts) for reproducibility.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., exact GPU/CPU models, memory) used for running the experiments. It only mentions a comparison of computational time, stating 'expending 1–2 seconds per iteration, whereas these mentioned techniques would take several hours to selected the next point to query.'
Software Dependencies	No	The paper mentions statistical and machine learning concepts and methods (e.g., Gaussian processes, Laplace approximation, MGP), but it does not list any specific software, libraries, or programming languages with version numbers (e.g., Python 3.8, PyTorch 1.9, scikit-learn 0.24).
Experiment Setup	Yes	Each algorithm shared a candidate set of 10 000 quasirandom tones X generated using a scrambled Halton set so as to densely cover the two-dimensional search space. ... For each audiometric test simulation, we initialized with ﬁve random tones, then allowed each algorithm to actively select a maximum of 25 additional tones... The exact intensity at which BAMS samples is determined by the prior over the notch-depth parameter d. When we changed the notch depth prior to support shallower or deeper notches (data not shown), BAMS sampled at lower or higher intensities, respectively, to continue to maximize model disagreement. Similarly, the spacing between samples is controlled by the prior over the notch-width parameter w.