Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
AR-Pro: Counterfactual Explanations for Anomaly Repair with Formal Properties
Authors: Xiayan Ji, Anton Xue, Eric Wong, Oleg Sokolsky, Insup Lee
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our anomaly explainability framework, AR-Pro, on vision (MVTec, Vis A) and time-series (SWa T, WADI, HAI) anomaly datasets. |
| Researcher Affiliation | Academia | Xiayan Ji Anton Xue Eric Wong Oleg Sokolsky Insup Lee Department of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 EMAIL |
| Pseudocode | No | The paper describes methods and processes but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | The code used for the experiments is accessible at: https://github.com/xjiae/arpro. |
| Open Datasets | Yes | We demonstrate the effectiveness of our anomaly explainability framework, AR-Pro, on vision (MVTec, Vis A) and time-series (SWa T, WADI, HAI) anomaly datasets. |
| Dataset Splits | Yes | Each experiment employs a representative anomaly detector and dataset with predefined train-test splits. |
| Hardware Specification | Yes | All experiments were done on a server with three NVIDIA Ge Force RTX 4090 GPUs. |
| Software Dependencies | No | The paper mentions using GPT-2, Llama2, DDPM, and Diffusion-TS models, and frameworks like anomalib and Hugging Face implementations, but does not provide specific version numbers for underlying software dependencies such as Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | Both Fast Flow and Efficient-AD were trained with Adam W and a learning rate of 10 4 until convergence. Both our versions of GPT-2 and Llama-2 were trained with Adam W and a learning rate of 10 5 until convergence. We randomly sampled 100 instances to compute the mean of each metric in order to evaluate the effect of hyper-parameters λ1, λ2, λ3, λ4 associated with each property-based loss. |