Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Exploring Safety Supervision for Continual Test-time Domain Adaptation
Authors: Xu Yang, Yanan Gu, Kun Wei, Cheng Deng
IJCAI 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results demonstrate that our method achieves state-of-the-art performance on several benchmark datasets. 4 Experiments In this section, we review the proposed method on several benchmark tasks: CIFAR10-to-CIFAR10C (Standard and Gradual), CIFAR100-to-CIFAR100C, and Image Netto-Image Net-C. |
| Researcher Affiliation | Academia | Xu Yang , Yanan Gu , Kun Wei and Cheng Deng Xidian University EMAIL |
| Pseudocode | No | The paper describes the proposed methods using textual descriptions and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or provide links to a code repository for the described methodology. |
| Open Datasets | Yes | We use CIFAR10, CIFAR100, and Image Net as the source domain datasets, and CIFAR10C, CIFAR100C, and Image Net-C as the corresponding target domain datasets, respectively. The target domain datasets were originally created to evaluate the robustness of classification networks [Hendrycks and Dietterich, 2019]. |
| Dataset Splits | No | The paper describes source data (training) and continually changing target test data (for adaptation and evaluation), but it does not specify a distinct validation dataset split with proportions or sample counts needed for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or types of computing clusters used for running experiments. |
| Software Dependencies | No | The paper mentions software like 'Adam' for optimization and specific network architectures (Wide Res Net28, Res Ne Xt-29, Res Net-50), but does not provide specific version numbers for any software dependencies, libraries, or frameworks. |
| Experiment Setup | Yes | We use Adam to optimize the network and set the learning rate to 1e-3. The data augmentation strategy is the same as [Wang et al., 2022], including color jitter, gaussian blur, gaussian noise, random affine, and random horizontal flip, N = 8. [...] N = 4. [...] The smoothing factor α is set as 0.99. |