Cross-Device Collaborative Test-Time Adaptation
Authors: Guohao Chen, Shuaicheng Niu, Deyu Chen, Shuhai Zhang, Changsheng Li, Yuanqing Li, Mingkui Tan
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we propose test-time Collaborative Lifelong Adaptation (Co LA), which is a general paradigm that can be incorporated with existing advanced TTA methods to boost the adaptation performance and efficiency in a multi-device collaborative manner. Specifically, we maintain and store a set of device-shared domain knowledge vectors, which accumulates the knowledge learned from all devices during their lifelong adaptation process. Based on this, Co LA conducts two collaboration strategies for devices with different computational resources and latency demands. 1) Knowledge reprogramming learning strategy jointly learns new domain-specific model parameters and a reweighting term to reprogram existing shared domain knowledge vectors, termed adaptation on principal agents. 2) Similarity-based knowledge aggregation strategy solely aggregates the knowledge stored in shared domain vectors according to domain similarities in an optimizationfree manner, termed adaptation on follower agents. Experiments verify that Co LA is simple but effective, which boosts the efficiency of TTA and demonstrates remarkable superiority in collaborative, lifelong, and single-domain TTA scenarios, e.g., on follower agents, we enhance accuracy by over 30% on Image Net-C while maintaining nearly the same efficiency as standard inference. |
| Researcher Affiliation | Collaboration | Guohao Chen1 2 Shuaicheng Niu3 Deyu Chen1 Shuhai Zhang1 2 Changsheng Li4 Yuanqing Li1 2 Mingkui Tan1 2 1South China University of Technology, 2Pazhou Laboratory, 3Nanyang Technological University, 4Beijing Institute of Technology |
| Pseudocode | Yes | We summarize the pseudo-code in Algorithm 1 and illustrate the overall pipeline of Co LA in Figure 2. |
| Open Source Code | Yes | The source code is available at https://github.com/Cascol-Chen/COLA. |
| Open Datasets | Yes | Datasets and models. We conduct experiments on the Image Net-1k [6], as well as five benchmarks for OOD generalization, i.e., Image Net-C [16] (contains corrupted images in 15 types of 4 main categories and each type has 5 severity levels), Image Net-R (various artistic renditions of 200 Image Net classes) [15], Image Net-Sketch [55], Image Net-A [17], and Image Net-V2 [44]. |
| Dataset Splits | Yes | The model is trained on the source Image Net-1K [6] training set and the model weights are obtained from the timm repository [60]... Evaluation on lifelong TTA. In Table 1, the model is online adapted to 15 corruptions over 10 rounds (total 150 corruptions)... Image Net-V2 for evaluation, in which the images are sampled to match the class frequency distributions of the original Image Net validation dataset. |
| Hardware Specification | Yes | Table 5: Comparison w.r.t. wall-clock time and memory on Image Net-C (Gaussian, level 5) on an A100 GPU... All experiments are conducted on a single NVIDIA A100 GPUS, using Py Torch framework with version 1.8.0. |
| Software Dependencies | Yes | All experiments are conducted on a single NVIDIA A100 GPUS, using Py Torch framework with version 1.8.0. |
| Experiment Setup | Yes | θ is optimized by following the update rules of the integrated baseline as listed in Appendix C. α is updated via the Adam W optimizer with a learning rate of 0.1. The shift detection threshold z is set to 0.1. For follower agents, we consistently set Tf in Eqn. (3) to 5 for all experiments. More details are put in Appendix A and C. ... Moving average factor λ is set to 0.2. ... we use SGD as the update rule, with a momentum of 0.9, batch size of 64, and a learning rate of 0.001. |