Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers

Authors: Qi Deng, Shuaicheng Niu, Ronghao Zhang, Yaofo Chen, Runhao Zeng, Jian Chen, Xiping Hu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Promising results on Image Net-C/R/Sketch/A indicate that our method surpasses current state-of-the-art methods with fewer updates, less data, and significantly shorter adaptation times. Compared with a previous SOTA SAR, we achieve 7.4% accuracy improvement and 4.2 faster adaptation speed on Image Net-C. [...] Extensive results indicate that our method surpasses existing SOTAs with fewer updates, fewer data, and significantly shorter adaptation times. [...] Experiments in Table 6 demonstrate that 128 images are sufficient for our method to achieve excellent performance.
Researcher Affiliation	Academia	1South China University of Technology, 2Nanyang Technological University, 3Artificial Intelligence Research Institute, Shenzhen MSU-BIT University {dengqi.kei; niushuaicheng; zhangronghao16; runhaozeng.cs}@gmail.com; EMAIL;
Pseudocode	Yes	We summarize the overall pseudo-code of our method in algorithm 1. Algorithm 1: The pre-training/TTA pipeline of MGTTA.
Open Source Code	Yes	Code https://github.com/keikeiqi/MGTTA
Open Datasets	Yes	We conduct experiments on four benchmark datasets, including 1) Image Net-C (Hendrycks and Dietterich 2019) contains corrupted images in 15 types of 4 main categories and each type has 5 severity levels. [...] 2) Image Net R (Hendrycks et al. 2021a) contains various artistic renditions of 200 Image Net classes. 3) Image Net-Sketch (Wang et al. 2019) includes sketch-style images representing 1,000 Image Net classes. 4) Image Net-A (Hendrycks et al. 2021b) consists of natural adversarial examples.
Dataset Splits	Yes	For pre-training MGG, we randomly select 128 unlabeled samples from the Image Net-C validation set. [...] In our main experiments, we random sample 128 images without labels from the held-out validation set of Image Net-C as the training set of MGG, and then test the trained MGG on all Image Net-C testing datasets and other Image Net variants. [...] Total #batches is 782, with batch size 64.
Hardware Specification	Yes	Table 5: Wall-clock runtime for processing 50,000 images of Image Net-C on a RTX 4090 GPU, and Acc. averaged over 15 corruptions.
Software Dependencies	No	The paper does not provide specific ancillary software details with version numbers.
Experiment Setup	Yes	The learning rate is set to 1e-4 for θ and 1e-2 for ϕr. We update θ and ϕr for T=2,000 iterations with a batch size of 2. The GML hidden size is set to 8. During TTA, the batch size is 64, and ϕr is fixed, the learning rate for θ is set to 1e-3.