Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
On the Adversarial Vulnerability of Label-Free Test-Time Adaptation
Authors: Shahriar Rifat, Jonathan Ashdown, Michael De Lucia, Ananthram Swami, Francesco Restuccia
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments on CIFAR10-C, CIFAR100-C, and Image Net-C, we demonstrate that our proposed approach closely matches the performance of state-of-the-art attack benchmarks, even without access to labeled samples. In certain cases, our approach generates stronger attacks, e.g., more than 4% higher error rate on CIFAR10-C. Source code for the experiments is available at https://github.com/Restuccia-Group/tta-adv.git. |
| Researcher Affiliation | Collaboration | Shahriar Rifat , Jonathan Ashdown , Michael De Lucia , Ananthram Swami and Francesco Restuccia Northeastern University, United States DEVCOM Army Research Laboratory, United States Air Force Research Laboratory, United States |
| Pseudocode | Yes | Algorithm 1: FCA Algorithm |
| Open Source Code | Yes | Source code for the experiments is available at https://github.com/Restuccia-Group/tta-adv.git. |
| Open Datasets | Yes | We leverage three primary benchmark datasets typically used for TTA performance evaluation, i.e., CIFAR10-C, CIFAR100-C, and Image Net-C. We directly obtain the CIFAR10-C and CIFAR100-C test dataset from Robustbench (Croce et al., 2020). For Image Net-C, we use the provided data by (Hendrycks & Dietterich, 2019). |
| Dataset Splits | Yes | Unless otherwise specified, we use a test batch size of 200 for each trial where 20% samples are selected as compromised ones |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions models like Res Net-32 and Res Net-50, and refers to "pytorch-cifar-models" and "torchvision(resnet50-v2)", but it does not specify versions for general software dependencies like Python, PyTorch, or CUDA, which are needed for replication. |
| Experiment Setup | Yes | Unless otherwise specified, we use a test batch size of 200 for each trial where 20% samples are selected as compromised ones, adversarial learning rate α = 2/255, perturbation constraint ϵ = 8/255 and iteration steps for attack to be 100. |