Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable Approximate Bayesian Inference for Outlier Detection under Informative Sampling

Authors: Terrance D. Savitsky

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide a simulation study to demonstrate that our approach produces unbiased estimation for the outlying cluster under informative sampling. The method is applied for outlier nomination for the Current Employment Statistics survey conducted by the Bureau of Labor Statistics.
Researcher Affiliation	Academia	Terrance D. Savitsky EMAIL U. S. Bureau of Labor Statistics Office of Survey Methods Research Washington, DC 20212, USA
Pseudocode	Yes	Appendix A. Hierarchical Clustering Algorithm Loop over algorithm blocks, A.2 and A.3 until convergence. Algorithm A.1: Initialize local and global cluster objects Algorithm A.2: Build Local and Global Clusters Algorithm A.3: Merge global clusters
Open Source Code	No	We implement both algorithms we present, below, in the growclusters package for R(R Core Team 2014), which is written in C++ for fast computation and available from the authors on request.
Open Datasets	No	The U.S. Bureau of Labor Statistics (BLS) administers the Current Employment Statistics (CES) survey to over 350000 non-farm, public and private business establishments across the U.S. on a monthly basis, receiving approximately 270000 submitted responses in each month.
Dataset Splits	Yes	Observations are next randomly allocated into two sets of equal size; one used to train the model and the other to evaluate the resultant energy.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	We implement both algorithms we present, below, in the growclusters package for R(R Core Team 2014), which is written in C++ for fast computation and available from the authors on request.
Experiment Setup	Yes	We chose the values of (λL = 1232, λK = 2254) that maximized the C index for our sampling-weighted hierarchical clustering model.