Multivariate Conditional Outlier Detection and Its Clinical Application

Authors: Charmgil Hong, Milos Hauskrecht

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present experimental results on a clinical dataset obtained from Cincinnati Children s Hospital Medical Center (Pestian et al. 2007). The dataset contains 978 instances; each consists of 1,449 features (x) extracted from clinical progress notes and 45 binary response variables (y) representing the diseases diagnosed. We compare our Multivariate Conditional Outlier DEtection method (MCODE) (Hong and Hauskrecht 2015) with two state-of-the-art multivariate outlier detection methods: Local Outlier Factor (LOF) (Breunig et al. 2000) and One-class SVM (OS) (Amer, Goldstein, and Abdennadher 2013). We performed 10-fold cross validation; on each round, we perturbed 0.5% of the data by randomly flipping 1 to 5 response variables (hence, the outliers represent misdiagnoses), and evaluated how the methods identify the outliers. Figure 1 shows the results in terms of the area under the precision-recall curve (AUCPR).
Researcher Affiliation Academia Charmgil Hong and Milos Hauskrecht Department of Computer Science University of Pittsburgh Pittsburgh, PA 15260 {charmgil, milos}@cs.pitt.edu
Pseudocode No No pseudocode or algorithm blocks are provided in the paper.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets Yes We present experimental results on a clinical dataset obtained from Cincinnati Children s Hospital Medical Center (Pestian et al. 2007).
Dataset Splits Yes We performed 10-fold cross validation; on each round, we perturbed 0.5% of the data by randomly flipping 1 to 5 response variables (hence, the outliers represent misdiagnoses), and evaluated how the methods identify the outliers.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types) used for running experiments are mentioned.
Software Dependencies No No specific software dependencies with version numbers are mentioned in the paper.
Experiment Setup No The paper describes the general experimental setup (e.g., 10-fold cross-validation, data perturbation) and the methods compared, but it does not provide specific hyperparameters or system-level training settings for the models used (e.g., learning rates, batch sizes, or specific parameters for LOF or One-class SVM).