Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CLIPTTA: Robust Contrastive Vision-Language Test-Time Adaptation

Authors: Marc Lafon, Gustavo Vargas Hakim, Clément Rambour, Christian Desrosiers, Nicolas THOME

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated on 75 datasets spanning diverse distribution shifts, CLIPTTA consistently outperforms entropy-based objectives and is highly competitive with state-of-the-art TTA methods, outperforming them on a large number of datasets and exhibiting more stable performance across diverse shifts. Source code is available at: CLIPTTA Repository.
Researcher Affiliation	Academia	1Conservatoire National des Arts et Métiers, CEDRIC, F-75141 Paris, France 2Sorbonne Université, CNRS, ISIR, F-75005 Paris, France 3ETS Montreal, Canada 4Institut universitaire de France (IUF)
Pseudocode	No	The paper includes theoretical analysis and mathematical derivations of gradients but does not present any explicit pseudocode or algorithm blocks. The methods are described narratively and with equations.
Open Source Code	Yes	Source code is available at: CLIPTTA Repository.
Open Datasets	Yes	Datasets. CLIPTTA is evaluated on four families of adaptation benchmarks: corruptions (CIFAR10/100-C, Imagenet-C) with 15 perturbations, domain shifts (Vis DA-C, PACS, Office Home, Imagenet Domains), semantic datasets, including coarse(CIFAR-10/100) and fine-grained classification (Imagenet, and 10 datasets from the CLIP zero-shot suite). In total, this represents a thorough evaluation over 75 datasets. A detailed description is provided in Appendix C.2. In open-set TTA, SVHN and Places-365 serve as OOD counterparts for CIFAR-10/100 and Imagenet, respectively.
Dataset Splits	No	The paper mentions using various datasets (e.g., CIFAR-10, Imagenet) but does not explicitly detail the training/test/validation splits used for these datasets, nor does it refer to specific predefined splits or provide sample counts for each split. It states that Imagenet has '50,000 images in total' but no split information.
Hardware Specification	Yes	Experiments were performed on two NVIDIA V100 32GB GPUs.
Software Dependencies	No	The paper mentions providing a library in "Python/PyTorch" in the checklist, but it does not specify the version numbers for Python, PyTorch, or any other key software dependencies or libraries used in the implementation.
Experiment Setup	Yes	Adaptation is performed with batches of 128 images using the Adam optimizer and a learning rate of 10 4 over 10 iterations. Experiments are conducted in a non-episodic manner, i.e., without restoring the model s parameters after each batch. Following the standard TTA protocol, we adapt the affine parameters of the visual encoder s normalization layers. In the open-set setting, we add 128 OOD images per batch, as done in prior work [28, 27]. The regularization and OCE losses weights are set to λreg = 1 and λoce = 1, respectively.