ADMoE: Anomaly Detection with Mixture-of-Experts from Noisy Labels

Authors: Yue Zhao, Guoqing Zheng, Subhabrata Mukherjee, Robert McCann, Ahmed Awadallah

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive results on eight datasets (including a proprietary enterprise security dataset) demonstrate the effectiveness of ADMo E, where it brings up to 34% performance improvement over not using it.
Researcher Affiliation Collaboration Yue Zhao1*, Guoqing Zheng2, Subhabrata Mukherjee2, Robert Mc Cann2, Ahmed Awadallah2 1Carnegie Mellon University 2Microsoft zhaoy@cmu.edu, {zheng, submukhe, robmccan, hassanam}@microsoft.com
Pseudocode No The paper describes the ADMo E framework and its components (e.g., MoE layers, gating function, loss function) using textual descriptions and diagrams (like Figure 2) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes 2See code and appendix: https://github.com/microsoft/admoe
Open Datasets Yes As shown in Table 2, we evaluate ADMo E on seven public datasets adapted from AD repositories (Campos et al. 2016; Han et al. 2022) and a proprietary enterprise-security dataset (with t = 3 sets of noisy labels).
Dataset Splits Yes For methods with built-in randomness, we run four independent trials and take the average, with a fixed dataset split (70% train, 25% for test, 5% for validation).
Hardware Specification No The paper describes the experimental setup and training process but does not provide specific details on the hardware used, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper does not explicitly provide a list of software dependencies with specific version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') needed for replication in the provided text.
Experiment Setup Yes Backbone AD Algorithms, Model Capacity, and Hyperparameters. We show the generality of ADMo E to enhance (i) simple MLP and (ii) SOTA Deep SAD (Ruff et al. 2019). To ensure a fair comparison, we ensure all methods have the equivalent number of trainable parameters and FLOPs. See Appx. C.2 and code for additional settings.