MECTA: Memory-Economic Continual Test-Time Model Adaptation

Authors: Junyuan Hong, Lingjuan Lyu, Jiayu Zhou, Michael Spranger

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On three datasets, CIFAR10, CIFAR100, and Image Net, MECTA improves the accuracy by at least 6% with constrained memory and significantly reduces the memory costs of Res Net50 on Image Net by at least 70% with comparable accuracy. Our codes can be accessed at https://github.com/Sony AI/MECTA. and 5 EXPERIMENTS Datasets and pre-trained models. To evaluate the OOD generalization of models, we adopt three image-classification datasets: the CIFAR10-C, CIFAR100-C (Krizhevsky, 2009) and Image Net C (Deng et al., 2009) following previous arts (Niu et al., 2022).
Researcher Affiliation Collaboration Junyuan Hong1 , Lingjuan Lyu2, Jiayu Zhou1, Michael Spranger2 1Michigan State University, 2Sony AI {hongju12,jiayuz}@msu.edu, {lingjuan.lv,michael.spranger}@sony.com
Pseudocode Yes Finally, we summarize the proposed method in Algorithm 1, where our method includes three hyperparameters to trade off accuracy and memory. Algorithm 1 Memory-Economic Continual Test-time Adaptation (MECTA)
Open Source Code Yes Our codes can be accessed at https://github.com/Sony AI/MECTA.
Open Datasets Yes To evaluate the OOD generalization of models, we adopt three image-classification datasets: the CIFAR10-C, CIFAR100-C (Krizhevsky, 2009) and Image Net C (Deng et al., 2009) following previous arts (Niu et al., 2022).
Dataset Splits No The paper describes a 'lifelong setting' with streaming data and sequential corruptions, but does not explicitly provide specific train/validation/test dataset splits with percentages, counts, or references to predefined splits for reproduction.
Hardware Specification Yes We implement our algorithm using Py Torch 1.12.1, cudatoolkit 11.6 on NVIDIA Tesla T4 GPUs.
Software Dependencies Yes We implement our algorithm using Py Torch 1.12.1, cudatoolkit 11.6 on NVIDIA Tesla T4 GPUs.
Experiment Setup Yes All test-time adaptation objectives are optimized by stochastic gradient descent (SGD) with a momentum of 0.9. Tent and EATA utilize a batch size of 64 with a learning rate of 0.005 (0.00025) for CIFAR-10 (CIFAR100 and Image Net). In our implementation, we use 0.0025 (0.0001) as learning rates to stabilize the training with smaller batch sizes. EATA uses 2,000 samples to estimate a Fisher matrix for anti-forgetting regularization. For MECTA, we set the threshold βth for stopping layer training as 0.0025 for CIFAR100, 0.00125 for CIFAR10, and 0.00125 for Image Net-C. The cache pruning rate is set to be 0.7 for all datasets.