Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Smoothing the Shift: Towards Stable Test-Time Adaptation under Complex Multimodal Noises

Authors: Zirun Guo, Tao Jin

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on two public datasets show the effectiveness and superiority over existing methods under the complex noise patterns in multimodal data. Code is available at https://github.com/zrguo/Su Mi.
Researcher Affiliation	Academia	Zirun Guo Tao Jin Zhejiang University EMAIL
Pseudocode	Yes	Algorithm 1 Su Mi
Open Source Code	Yes	Code is available at https://github.com/zrguo/Su Mi.
Open Datasets	Yes	Datasets. We use two widely used multimodal datasets, Kinetics50 (Kay et al., 2017) and VGGSound (Chen et al., 2020) for evaluation. Following previous work (Hendrycks & Dietterich, 2019; Yang et al., 2024), we introduce 15 different types of corruptions and 6 types for audio to simulate the distribution shifts in real-world applications.
Dataset Splits	Yes	Following Yang et al. (2024), we use a subset of Kinetics which consists of 50 classes, 29,204 training pairs and 2,466 test pairs.
Hardware Specification	No	The paper does not provide specific hardware details for running its experiments. It mentions using a pre-trained model and an optimizer but no information about GPUs, CPUs, or other computing resources.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers. It mentions using the Adam optimizer and the pre-trained CAV-MAE model, but no versions for frameworks like PyTorch or TensorFlow, or other libraries.
Experiment Setup	Yes	We use Adam optimizer with a learning rate of 1e-4/1e-5 and batch size of 16/64 for Kinetics50-C and VGGSound-C, respectively. The multimodal threshold γm in Equation 4 and the normalization factor Ent0 in Equation 7 are set to 0.4 ln C following Niu et al. (2022) by default where C is the number of task classes. The unimodal threshold γu in Equation 4 is set to e 1 by default. The smoothing coefficient β is set to 0.6/0.9, the weighting term λ is set to 5.0 and the unimodal assistance µ is set to 1.0 by default for Kinetics50-C and VGGSound-C. For strong OOD adaptation, we set the mutual information sharing term t0 as iter/2. Following previous work (Niu et al., 2023; Gong et al., 2023a; Chen et al., 2024; Guo et al., 2024b), we update the affine parameters of normalization layers.