Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scalable Approximate Bayesian Inference for Outlier Detection under Informative Sampling
Authors: Terrance D. Savitsky
JMLR 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a simulation study to demonstrate that our approach produces unbiased estimation for the outlying cluster under informative sampling. The method is applied for outlier nomination for the Current Employment Statistics survey conducted by the Bureau of Labor Statistics. |
| Researcher Affiliation | Academia | Terrance D. Savitsky EMAIL U. S. Bureau of Labor Statistics Office of Survey Methods Research Washington, DC 20212, USA |
| Pseudocode | Yes | Appendix A. Hierarchical Clustering Algorithm Loop over algorithm blocks, A.2 and A.3 until convergence. Algorithm A.1: Initialize local and global cluster objects Algorithm A.2: Build Local and Global Clusters Algorithm A.3: Merge global clusters |
| Open Source Code | No | We implement both algorithms we present, below, in the growclusters package for R(R Core Team 2014), which is written in C++ for fast computation and available from the authors on request. |
| Open Datasets | No | The U.S. Bureau of Labor Statistics (BLS) administers the Current Employment Statistics (CES) survey to over 350000 non-farm, public and private business establishments across the U.S. on a monthly basis, receiving approximately 270000 submitted responses in each month. |
| Dataset Splits | Yes | Observations are next randomly allocated into two sets of equal size; one used to train the model and the other to evaluate the resultant energy. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | We implement both algorithms we present, below, in the growclusters package for R(R Core Team 2014), which is written in C++ for fast computation and available from the authors on request. |
| Experiment Setup | Yes | We chose the values of (λL = 1232, λK = 2254) that maximized the C index for our sampling-weighted hierarchical clustering model. |