Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Partition-Then-Adapt: Combating Prediction Bias for Reliable Multi-Modal Test-Time Adaptation

Authors: Guowei Wang, Fan Lyu, Changxing Ding

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments validate the effectiveness of PTA, surpassing state-of-the-art method by 6.1% on Kinetics50-MC and 5.8% on VGGSound-MC, respectively.
Researcher Affiliation	Academia	Guowei Wang1 Fan Lyu2 Changxing Ding1 1South China University of Technology 2National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences EMAIL, EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Quantile Ranking 1: Input: Sequence T 2: Output: Quantile sequence Q 3: Tsorted sort(T ), Q [ ] 4: for each t T do 5: Find first index j where Tsorted[j] = t 6: Find last index k where Tsorted[k] = t 7: Compute Qt j+k 2\|T \| 8: Append Qt to Q 9: end for 10: Return Q Algorithm 2 PTA 1: Input: Trained model MΘ, Data batch X 2: Calculate the prediction logits: p(X) = MΘ(X) 3: Reweighting each element in X using Eq. (1), Eq. (2), and Algorithm 1 4: Calculate the weighted entropy loss using Eq. (3) 5: Calculate the MMD-based loss via Eq. (4) 6: Calculate the overall loss using Eq. (5) 7: Update the tunable parameters Θ of Mθ
Open Source Code	Yes	Code of this paper is available at https://github.com/MPI-Lab/PTA.
Open Datasets	Yes	To comprehensively validate MM-TTA, we conduct experiments on benchmarks featuring both synthetic and real-world domain shifts. For synthetic shifts, we apply 15 types of corruptions [11] to the video modality and 6 types of corruptions [43] to the audio modality of Kinetics50 [14] and VGGSound [4], following the protocol in [43]. This results in a total of 90 combinations for each benchmark, e.g., Kinetics50-MC and VGGSound-MC. Each corruption type is applied at 5 severity levels. For real-world domain shifts, we choose CMU-MOSI [46], CMU-MOSEI [47], and CHSIMS [44] for evaluation, each comprising three modalities (e.g., text, audio, and video).
Dataset Splits	Yes	For synthetic shifts, we apply 15 types of corruptions [11] to the video modality and 6 types of corruptions [43] to the audio modality of Kinetics50 [14] and VGGSound [4], following the protocol in [43]. This results in a total of 90 combinations for each benchmark, e.g., Kinetics50-MC and VGGSound-MC. Each corruption type is applied at 5 severity levels...For the experiments on real-world domain shifts, we provide pre-trained models for CMU-MOSI [46], CMU-MOSEI [47], and CH-SIMS [44] following the training protocol [10].
Hardware Specification	Yes	We run all experiments with 5 random seeds on one NVIDIA 4090 GPU, and report the average accuracy.
Software Dependencies	No	The paper mentions using 'Adam optimizer' and references official code implementations for baselines (e.g., 'The implementation follows the official code2' for TENT), but does not explicitly list specific software libraries (like PyTorch, TensorFlow) with their version numbers.
Experiment Setup	Yes	The learning rate and batch size are set to 2e-4 and 32 for Kinetics-C, and 1e-4 and 64 for VGGSound-C, respectively. For the experiments on real-world domain shifts, we provide pre-trained models for CMU-MOSI [46], CMU-MOSEI [47], and CH-SIMS [44] following the training protocol [10]. The learning rate and batch size are set to 1e-3 and 24. Details are in Appendix B. By default, hyperparameters are set as s = 0.5, λ = 1. Following [43, 50], we update query/key/value transformation matrices of the attention layer in the fusion block.