Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning to Generate Gradients for Test-Time Adaptation via Test-Time Training Layers

Authors: Qi Deng, Shuaicheng Niu, Ronghao Zhang, Yaofo Chen, Runhao Zeng, Jian Chen, Xiping Hu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Promising results on Image Net-C/R/Sketch/A indicate that our method surpasses current state-of-the-art methods with fewer updates, less data, and significantly shorter adaptation times. Compared with a previous SOTA SAR, we achieve 7.4% accuracy improvement and 4.2 faster adaptation speed on Image Net-C. [...] Extensive results indicate that our method surpasses existing SOTAs with fewer updates, fewer data, and significantly shorter adaptation times. [...] Experiments in Table 6 demonstrate that 128 images are sufficient for our method to achieve excellent performance.
Researcher Affiliation Academia 1South China University of Technology, 2Nanyang Technological University, 3Artificial Intelligence Research Institute, Shenzhen MSU-BIT University {dengqi.kei; niushuaicheng; zhangronghao16; runhaozeng.cs}@gmail.com; EMAIL;
Pseudocode Yes We summarize the overall pseudo-code of our method in algorithm 1. Algorithm 1: The pre-training/TTA pipeline of MGTTA.
Open Source Code Yes Code https://github.com/keikeiqi/MGTTA
Open Datasets Yes We conduct experiments on four benchmark datasets, including 1) Image Net-C (Hendrycks and Dietterich 2019) contains corrupted images in 15 types of 4 main categories and each type has 5 severity levels. [...] 2) Image Net R (Hendrycks et al. 2021a) contains various artistic renditions of 200 Image Net classes. 3) Image Net-Sketch (Wang et al. 2019) includes sketch-style images representing 1,000 Image Net classes. 4) Image Net-A (Hendrycks et al. 2021b) consists of natural adversarial examples.
Dataset Splits Yes For pre-training MGG, we randomly select 128 unlabeled samples from the Image Net-C validation set. [...] In our main experiments, we random sample 128 images without labels from the held-out validation set of Image Net-C as the training set of MGG, and then test the trained MGG on all Image Net-C testing datasets and other Image Net variants. [...] Total #batches is 782, with batch size 64.
Hardware Specification Yes Table 5: Wall-clock runtime for processing 50,000 images of Image Net-C on a RTX 4090 GPU, and Acc. averaged over 15 corruptions.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers.
Experiment Setup Yes The learning rate is set to 1e-4 for θ and 1e-2 for ϕr. We update θ and ϕr for T=2,000 iterations with a batch size of 2. The GML hidden size is set to 8. During TTA, the batch size is 64, and ϕr is fixed, the learning rate for θ is set to 1e-3.