Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Unmasking Puppeteers: Leveraging Biometric Leakage to Expose Impersonation in AI-Based Videoconferencing

Authors: Danial Samadi Vahdati, Tai D. Nguyen, Ekta Prashnani, Koki Nagano, david luebke, Orazio Gallo, Matthew Stamm

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments on multiple talking-head generation models show that our method consistently outperforms existing puppeteering defenses, operates in real-time, and shows strong generalization to out-of-distribution scenarios. Extensive experiments across many generators and datasets show that our method achieves state-of-the-art detection performance while working in real-time. In summary, our contributions are as follows. ... Through extensive experiments across fifteen generator/dataset combinations, our approach achieves state-of-the-art puppeteering detection in real time, demonstrating practical viability for deployment in bandwidth-constrained videoconferencing systems. ... 5 Experiments and Results ... 6 Ablation Study
Researcher Affiliation	Collaboration	Danial Samadi Vahdati 1, Tai Duc Nguyen1, Koki Nagano2, David Luebke2, Orazio Gallo2, Ekta Prashnani2 Matthew Stamm1 1Drexel University, 2NVIDIA
Pseudocode	No	The paper describes its methodology using textual descriptions, equations (e.g., Eq. 1-9), and figures, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and trained models are available at https://github.com/MISLresearch/Unmasking-Puppeteers-Neurips25.
Open Datasets	Yes	We conduct our experiments using the NVFAIR [11] pooled dataset because it incorporates together a large set recorded video-conference calls in a controlled setting environment. This dataset includes three subsets: (1) NVIDIA VC [11] (natural video calls), (2) CREMA-D [55] (studio-recorded expressions), and (3) RAVDESS [54] (studio-recorded emotional speech).
Dataset Splits	Yes	Table 3: Statistics of the datasets and generated data used in this paper. Dataset # of IDs # Authorized Use Videos # Puppeteered Videos Train Test Total Train Test Total NVIDIA-VC (NVC) [11] 46 1,331 439 1,770 41,261 3,951 45,212 RAVDESS (RAV) [54] 24 704 264 968 10,560 1,320 11,880 CREMA-D (CRD) [55] 91 5,154 1,558 6,712 319,548 28,044 347,592
Hardware Specification	Yes	Computational Efficiency. We benchmarked our method on an RTX 3090 GPU, where it achieved on average 75 FPS well above the 60 FPS real-time threshold while having under 1M total number of parameters.
Software Dependencies	No	The paper describes the model architecture and training protocol but does not explicitly list any specific software dependencies with version numbers (e.g., Python, PyTorch, or other libraries).
Experiment Setup	Yes	Our key insight is that the pose-and-expression embeddings already transmitted by modern talking-head systems leak subtle, but reproducible, biometric signatures [22, 23]. We learn a compact Enhanced Biometric-Leakage (EBL) space in which identity cues are amplified while pose and expression variance is actively suppressed. A pose-conditioned contrastive loss drives this separation, and a lightweight temporal LSTM aggregates evidence to yield stable, millisecond-level decisions. ... The pose-conditioned Large-Margin Cosine Loss (PC-LMCL) is LB = LP + λ LN, with a single hyper-parameter λ controlling repulsion strength. ... We aggregate similarity scores over a window of W consecutive frames and feed these into an LSTM. ... A 40-frame window ( 1.3s at 30fps) captures sufficient temporal biometric information...