MECTA: Memory-Economic Continual Test-Time Model Adaptation
Authors: Junyuan Hong, Lingjuan Lyu, Jiayu Zhou, Michael Spranger
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On three datasets, CIFAR10, CIFAR100, and Image Net, MECTA improves the accuracy by at least 6% with constrained memory and significantly reduces the memory costs of Res Net50 on Image Net by at least 70% with comparable accuracy. Our codes can be accessed at https://github.com/Sony AI/MECTA. and 5 EXPERIMENTS Datasets and pre-trained models. To evaluate the OOD generalization of models, we adopt three image-classification datasets: the CIFAR10-C, CIFAR100-C (Krizhevsky, 2009) and Image Net C (Deng et al., 2009) following previous arts (Niu et al., 2022). |
| Researcher Affiliation | Collaboration | Junyuan Hong1 , Lingjuan Lyu2, Jiayu Zhou1, Michael Spranger2 1Michigan State University, 2Sony AI {hongju12,jiayuz}@msu.edu, {lingjuan.lv,michael.spranger}@sony.com |
| Pseudocode | Yes | Finally, we summarize the proposed method in Algorithm 1, where our method includes three hyperparameters to trade off accuracy and memory. Algorithm 1 Memory-Economic Continual Test-time Adaptation (MECTA) |
| Open Source Code | Yes | Our codes can be accessed at https://github.com/Sony AI/MECTA. |
| Open Datasets | Yes | To evaluate the OOD generalization of models, we adopt three image-classification datasets: the CIFAR10-C, CIFAR100-C (Krizhevsky, 2009) and Image Net C (Deng et al., 2009) following previous arts (Niu et al., 2022). |
| Dataset Splits | No | The paper describes a 'lifelong setting' with streaming data and sequential corruptions, but does not explicitly provide specific train/validation/test dataset splits with percentages, counts, or references to predefined splits for reproduction. |
| Hardware Specification | Yes | We implement our algorithm using Py Torch 1.12.1, cudatoolkit 11.6 on NVIDIA Tesla T4 GPUs. |
| Software Dependencies | Yes | We implement our algorithm using Py Torch 1.12.1, cudatoolkit 11.6 on NVIDIA Tesla T4 GPUs. |
| Experiment Setup | Yes | All test-time adaptation objectives are optimized by stochastic gradient descent (SGD) with a momentum of 0.9. Tent and EATA utilize a batch size of 64 with a learning rate of 0.005 (0.00025) for CIFAR-10 (CIFAR100 and Image Net). In our implementation, we use 0.0025 (0.0001) as learning rates to stabilize the training with smaller batch sizes. EATA uses 2,000 samples to estimate a Fisher matrix for anti-forgetting regularization. For MECTA, we set the threshold βth for stopping layer training as 0.0025 for CIFAR100, 0.00125 for CIFAR10, and 0.00125 for Image Net-C. The cache pruning rate is set to be 0.7 for all datasets. |