Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness

Authors: Suhyeok Jang, Seojin Kim, Jinwoo Shin, Jongheon Jeong

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate that FT-CADIS has established the state-of-the-art certified robustness among denoised smoothing methods across all ℓ2-adversary radius in a variety of benchmarks, such as CIFAR-10 and Image Net.
Researcher Affiliation	Academia	Suhyeok Jang* EMAIL Korea Advanced Institute of Science & Technology (KAIST) Seojin Kim* EMAIL Korea Advanced Institute of Science & Technology (KAIST) Jinwoo Shin EMAIL Korea Advanced Institute of Science & Technology (KAIST) Jongheon Jeong EMAIL Korea University
Pseudocode	Yes	Algorithm 1 Fine-Tuning with Confidence-Aware Denoised Image Selection (FT-CADIS) Algorithm 2 Noise-and-Denoise Procedure (Carlini et al., 2023)
Open Source Code	No	The paper provides links to third-party code used or code for baseline methods (e.g., "https://github.com/ethz-spylab/diffusion-denoised-smoothing" for baselines, "https://github.com/JamesQFreeman/LoRA-ViT" for LoRA implementation), but does not explicitly state that the source code for the methodology described in this paper (FT-CADIS) is made available or provide a direct link to its implementation.
Open Datasets	Yes	CIFAR-10 (Krizhevsky, 2009) consists of 60,000 RGB images of size 32 32, with 50,000 images for training and 10,000 for testing. ... The full dataset can be downloaded at https://www.cs.toronto.edu/kriz/cifar.html. Image Net (Russakovsky et al., 2015) consists of 1.28 million training images and 50,000 validation images, each labeled into one of 1,000 classes. ... The full dataset can be downloaded at https://image-net.org/download.
Dataset Splits	Yes	CIFAR-10 (Krizhevsky, 2009) consists of 60,000 RGB images of size 32 32, with 50,000 images for training and 10,000 for testing. Image Net (Russakovsky et al., 2015) consists of 1.28 million training images and 50,000 validation images, each labeled into one of 1,000 classes. For the training images, we apply 224 224 randomly resized cropping and horizontal flipping. For the test images, we resize them to 256 256 resolution, followed by center cropping to 224 224.
Hardware Specification	Yes	In summary, we conduct our experiments using NVIDIA Ge Force RTX 2080 Ti GPUs for CIFAR-10, NVIDIA Ge Force RTX 3090 and NVIDIA RTX A6000 GPUs for Image Net. In the CIFAR-10 experiments, we utilize 4 NVIDIA Ge Force RTX 2080 Ti GPUs for fine-tuning per run, resulting in 8 hours of training cost. During the certification, we use 7 NVIDIA Ge Force RTX 2080 Ti GPUs for data splitting, taking 9 minutes per image (with N = 100, 000 for each inference) to perform a single pass of smoothed inference. In the Image Net experiments, we utilize 4 NVIDIA RTX A6000 GPUs for fine-tuning per run, observing 51 hours of training cost. During the certification, 8 NVIDIA Ge Force RTX 3090 GPUs are used in parallel, taking 4 minutes per image (with N = 10, 000 for each inference) to complete a single pass of smoothed inference.
Software Dependencies	No	The paper mentions software components like "Adam W (Loshchilov & Hutter, 2019)" for the optimizer, but it does not specify concrete version numbers for software libraries, programming languages (e.g., Python), or deep learning frameworks (e.g., PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	Table 6: Denoised fine-tuning settings for the off-the-shelf classifier on CIFAR-10 and Image Net. (a) CIFAR-10 end-to-end fine-tuning: Optimizer Adam W (Loshchilov & Hutter, 2019), Optimizer momentum β1, β2 = 0.9, 0.999, Base learning rate 5e-4 (σ = 0.25, 0.50), 1e-4 (σ = 1.00), Weight decay start, end = 0.04, 0.4 (cosine schedule), Layer-wise lr decay (Clark et al., 2020; Bao et al., 2022) 0.65, Batch size 128, Learning rate schedule cosine decay (Loshchilov & Hutter, 2022), Warmup epochs (Goyal et al., 2017) 3, Training epochs 30 (early stopping at 20), Drop path (Huang et al., 2016) 0.2, Gradient clipping (Zhang et al., 2019b) 0.3. (b) Image Net Lo RA (Hu et al., 2022) fine-tuning: Optimizer Adam W (Loshchilov & Hutter, 2019), Optimizer momentum β1, β2 = 0.9, 0.999, Base learning rate 2e-4 (σ = 0.25), 4e-4 (σ = 0.50, 1.00), Weight decay start, end = 0.02, 0.2 (σ = 0.25) start, end = 0.01, 0.1 (σ = 0.50, 1.00), Layer-wise lr decay (Clark et al., 2020; Bao et al., 2022) 0.8 (σ = 0.25), 0.9 (σ = 0.50, 1.00), Batch size 128, Learning rate schedule cosine decay (Loshchilov & Hutter, 2022), Warmup epochs (Goyal et al., 2017) 1, Training epochs 10 (early stopping at 5), Drop path (Huang et al., 2016) 0.0, Gradient clipping (Zhang et al., 2019b) 1.0, Lo RA rank r 4, Lo RA scaler α 4.