Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

On the Adversarial Vulnerability of Label-Free Test-Time Adaptation

Authors: Shahriar Rifat, Jonathan Ashdown, Michael De Lucia, Ananthram Swami, Francesco Restuccia

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive experiments on CIFAR10-C, CIFAR100-C, and Image Net-C, we demonstrate that our proposed approach closely matches the performance of state-of-the-art attack benchmarks, even without access to labeled samples. In certain cases, our approach generates stronger attacks, e.g., more than 4% higher error rate on CIFAR10-C. Source code for the experiments is available at https://github.com/Restuccia-Group/tta-adv.git.
Researcher Affiliation	Collaboration	Shahriar Rifat , Jonathan Ashdown , Michael De Lucia , Ananthram Swami and Francesco Restuccia Northeastern University, United States DEVCOM Army Research Laboratory, United States Air Force Research Laboratory, United States
Pseudocode	Yes	Algorithm 1: FCA Algorithm
Open Source Code	Yes	Source code for the experiments is available at https://github.com/Restuccia-Group/tta-adv.git.
Open Datasets	Yes	We leverage three primary benchmark datasets typically used for TTA performance evaluation, i.e., CIFAR10-C, CIFAR100-C, and Image Net-C. We directly obtain the CIFAR10-C and CIFAR100-C test dataset from Robustbench (Croce et al., 2020). For Image Net-C, we use the provided data by (Hendrycks & Dietterich, 2019).
Dataset Splits	Yes	Unless otherwise specified, we use a test batch size of 200 for each trial where 20% samples are selected as compromised ones
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions models like Res Net-32 and Res Net-50, and refers to "pytorch-cifar-models" and "torchvision(resnet50-v2)", but it does not specify versions for general software dependencies like Python, PyTorch, or CUDA, which are needed for replication.
Experiment Setup	Yes	Unless otherwise specified, we use a test batch size of 200 for each trial where 20% samples are selected as compromised ones, adversarial learning rate α = 2/255, perturbation constraint ϵ = 8/255 and iteration steps for attack to be 100.