Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing

Authors: Danial Samadi Vahdati, Tai D. Nguyen, Ekta Prashnani, Koki Nagano, david luebke, Orazio Gallo, Matthew Stamm

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments on multiple talking-head generation models show that our method consistently outperforms existing puppeteering defenses, operates in real-time, and shows strong generalization to out-of-distribution scenarios. Extensive experiments across many generators and datasets show that our method achieves state-of-the-art detection performance while working in real-time. In summary, our contributions are as follows. ... Through extensive experiments across fifteen generator/dataset combinations, our approach achieves state-of-the-art puppeteering detection in real time, demonstrating practical viability for deployment in bandwidth-constrained videoconferencing systems. ... 5 Experiments and Results ... 6 Ablation Study
Researcher Affiliation Collaboration Danial Samadi Vahdati 1, Tai Duc Nguyen1, Koki Nagano2, David Luebke2, Orazio Gallo2, Ekta Prashnani2 Matthew Stamm1 1Drexel University, 2NVIDIA
Pseudocode No The paper describes its methodology using textual descriptions, equations (e.g., Eq. 1-9), and figures, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code and trained models are available at https://github.com/MISLresearch/Unmasking-Puppeteers-Neurips25.
Open Datasets Yes We conduct our experiments using the NVFAIR [11] pooled dataset because it incorporates together a large set recorded video-conference calls in a controlled setting environment. This dataset includes three subsets: (1) NVIDIA VC [11] (natural video calls), (2) CREMA-D [55] (studio-recorded expressions), and (3) RAVDESS [54] (studio-recorded emotional speech).
Dataset Splits Yes Table 3: Statistics of the datasets and generated data used in this paper. Dataset # of IDs # Authorized Use Videos # Puppeteered Videos Train Test Total Train Test Total NVIDIA-VC (NVC) [11] 46 1,331 439 1,770 41,261 3,951 45,212 RAVDESS (RAV) [54] 24 704 264 968 10,560 1,320 11,880 CREMA-D (CRD) [55] 91 5,154 1,558 6,712 319,548 28,044 347,592
Hardware Specification Yes Computational Efficiency. We benchmarked our method on an RTX 3090 GPU, where it achieved on average 75 FPS well above the 60 FPS real-time threshold while having under 1M total number of parameters.
Software Dependencies No The paper describes the model architecture and training protocol but does not explicitly list any specific software dependencies with version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup Yes Our key insight is that the pose-and-expression embeddings already transmitted by modern talking-head systems leak subtle, but reproducible, biometric signatures [22, 23]. We learn a compact Enhanced Biometric-Leakage (EBL) space in which identity cues are amplified while pose and expression variance is actively suppressed. A pose-conditioned contrastive loss drives this separation, and a lightweight temporal LSTM aggregates evidence to yield stable, millisecond-level decisions. ... The pose-conditioned Large-Margin Cosine Loss (PC-LMCL) is LB = LP + λ LN, with a single hyper-parameter λ controlling repulsion strength. ... We aggregate similarity scores over a window of W consecutive frames and feed these into an LSTM. ... A 40-frame window ( 1.3s at 30fps) captures sufficient temporal biometric information...