Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Exploring and Leveraging Class Vectors for Classifier Editing

Authors: Jaeik Kim, Jaeyoung Do

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present extensive experiments demonstrating the effectiveness of Class Vectors in real-world applications such as model unlearning, adapting to unfamiliar environments, preventing typographic attacks, and optimizing triggers for backdoor attacks.
Researcher Affiliation	Academia	AIDAS Laboratory, 1IPAI & 2ECE, Seoul National University indicates corresponding author EMAIL
Pseudocode	Yes	Algorithm 1 Latent-space injection Algorithm 2 Weight-space mapping Algorithm 3 Pseudocode for optimizing classifier encoder with Class Vector Algorithm 4 Pseudocode for optimizing adversarial trigger with Class Vector
Open Source Code	No	We plan to release our code under the Apache 2.0 license.
Open Datasets	Yes	MNIST [41]: A handwritten digit classification task with 60,000 training images and 10,000 test images, categorized into 10 classes from 0 to 9. Euro SAT [24]: Land use and cover satellite image classification task, containing 13 spectral bands and 10 classes, with 16,000 training images and 5000 test images, 27,000 images in total with validation set. SVHN [51]: A real-world digit classification benchmark task containing a total of 630,000 images with 2,700 test data of house number plates. GTSRB [63]: Traffic sign classification task with 43 categories, containing 39,209 training data and 12,630 test data under varied lighting and complex backgrounds. RESISC45 [9]: A benchmark for remote sensing scene classification task with 31,500 images across 45 distinct scene types. DTD [11]: Image texture classification task with 47 categories, containing a collection of 5,640 texture images, sourced from diverse real-world settings. CIFAR10 [38]: The dataset is a 10-class classification task consisting of 60,000 images, with 6,000 images per class. It contains 50,000 training images and 10,000 test images. CIFAR100 [38]: The dataset is a subset of the Tiny Images dataset and consists of 60,000 color images. The 100 classes are organized into 20 superclasses, with each class containing 600 images. For evaluating Class Vectors in adapting to snowy environments ( 4.3) and defending against typography attacks ( 4.4), we use snowy Image Net and clean and text-attached object images from previous study [60].
Dataset Splits	Yes	MNIST [41]: A handwritten digit classification task with 60,000 training images and 10,000 test images, categorized into 10 classes from 0 to 9. Euro SAT [24]: ... with 16,000 training images and 5000 test images, 27,000 images in total with validation set. CIFAR10 [38]: ... It contains 50,000 training images and 10,000 test images. CIFAR100 [38]: ... It contains 50,000 training images and 10,000 test images.
Hardware Specification	Yes	All our experiments are conducted on a single NVIDIA A100 GPU.
Software Dependencies	No	The paper refers to deep learning models like Vi T-B/16, Vi T-B/32, Vi T-L/14, MLP, Res Net-18, and BERT-Base, and uses methods like Task Arithmetic, but does not specify software dependencies with version numbers (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	The Vision Transformer (Vi T-B/32) is fine-tuned per task for approximately 22 epochs with a learning rate of 1e-5 and a batch size of 128. For both the MLP and Res Net-18, we train on MNIST and CIFAR-10 with a learning rate of 1e-3, batch size 128 for 10 epochs, and on CIFAR-100 with a learning rate of 1e-4, batch size 512 for 300 epochs. ... Tables 8, 9, 10, 11, 12, 13 detail hyperparameters for specific tasks including epochs, sample size, learning rate, scaling coefficient, and reference sample count.