Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Lie Detector: Unified Backdoor Detection via Cross-Examination Framework

Authors: Xuan Wang, Siyuan Liang, Dongping Liao, Han Fang, Aishan Liu, Xiaochun Cao, Yu-liang Lu, Ee-Chien Chang, Xitong Gao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our method achieves superior detection performance, improving accuracy by 4.4%, 1.7%, and 10.6% over So TA baselines across supervised, self-supervised, and autoregressive learning tasks, respectively. ... We thoroughly assess Lie Detector s performance in supervised, self-supervised, and autoregressive settings. Our method works better than current methods, with relative increases of 4.4%, 1.7%, and 10.6% for SL, SSL, and AR, respectively. ... We compare our Lie Detector with 7 SOTA detection methods against 4 representative attacks on CIFAR-10 and Tiny Image Net. Lie Detector consistently achieves 100.0% DSR across all attacks and datasets, with average gains of 4.4% over the next-best methods and near-zero FPR.
Researcher Affiliation Academia Xuan Wang National University of Defense Technology EMAIL Siyuan Liang National University of Singapore EMAIL Dongping Liao State Key Lab of Io TSC, CIS Dept, University of Macau EMAIL Han Fang National University of Singapore EMAIL Aishan Liu Beihang University EMAIL Xiaochun Cao Sun Yat-sen University EMAIL Yuliang Lu National University of Defense Technology public Lu EMAIL Ee-chien Chang National University of Singapore EMAIL Xitong Gao Shenzhen Institutes of Advanced Technology, CAS Shenzhen University of Advanced Technology EMAIL
Pseudocode Yes Algorithm 1 Lie Detecor Require: Models f1, f2; clean subset Ds & finetune set Dft; thresholds η, γ; weights α, β, λ; epochs T; Adam Ensure: Backdoor status of f1 and f2 1: Initialize trigger mask m and pattern p Stage I: Cross-Model Trigger Reverse 2: for t = 1 to T do 3: for all (x, y) Ds do 4: x m p + (1 m) x Generate poisoned input (Eq. (4)) 5: Compute LOD, LCKA Eq. (5), Eq. (6) 6: end for 7: L α LCKA + β LOD + λ ( m 1 + p 1) Total loss (Eq. (7)) 8: Update the trigger (m, p) by minimizing L via Adam 9: end for 10: Db {x = m p + (1 m) x | x Ds} Build poisoned set 11: for f {f1, f2} do 12: Predict yi = f(x i), estimate target ˆyc by averaging Stage II: Activation-Based Identification 13: Compute ASR(f) = E[I(f(x ) = ˆyc)] 14: if ASR(f) > η then Stage III: Fine-tuning Sensitivity Analysis 15: Fine-tune f on Dft; compute ASR(f ); set ASR ASR(f) ASR(f ) 16: return Backdoored if ASR > γ else return Clean 17: else 18: return Clean 19: end if 20: end for
Open Source Code Yes Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: We provide the data and code in the anonymous repository.
Open Datasets Yes For supervised learning, we use Res Net18 [15] and VGG16 [42] on CIFAR-10 [21] and Tiny Image Net [44]. For self-supervised and autoregressive learning, we test CLIP [40] and Co Co Op on Image Net [7] and Caltech101 [11], while LLa VA [24] and Mini GPT-4 [54] are evaluated on COCO [32], Frisk-30k [10], and Frisk8k [16].
Dataset Splits Yes Specifically, each dataset is split into a 90%-10% training-validation ratio, with only 10% clean data accessible for detection.
Hardware Specification Yes All experiments are conducted using Py Torch, with models trained on NVIDIA A100 GPUs.
Software Dependencies No All experiments are conducted using Py Torch, with models trained on NVIDIA A100 GPUs.
Experiment Setup Yes In our experiments, we use equal numbers of clean and backdoored models. In each evaluation, two models are randomly sampled to form clean clean, clean backdoored, or backdoored backdoored pairs. To ensure a fair comparison, we randomly select 20 model pairs (without repetition) for testing and compute the detection performance by averaging their scores. The trigger is optimized with Adam. We use default hyperparameters for Algorithm 1: γ =0.2, η=0.75, α=0.6, β =0.3, λ=0.1, T =100. ... Table Appendix 2: Training Configuration for Different Datasets and Models Parameter CIFAR-10 Tiny Imagenet Caltech101 COCO Model Res Net-18 VGG-16 CLIP VLM Optimizer Adam Adam Adam Adam Batch Size 64 128 224 224 Epochs 60 100 100 100 Image Size 32 32 64 64 224 224 224 224 Learning Rate 1 10 3 1 10 4 1 10 3 1 10 3