Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

NegMerge: Sign-Consensual Weight Merging for Machine Unlearning

Authors: Hyo Seo Kim, Dongyoon Han, Junsuk Choe

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluations on zero-shot and standard image recognition tasks across twelve datasets and four backbone architectures show that our approach outperforms state-of-the-art methods while requiring similar or fewer computational resources.
Researcher Affiliation	Collaboration	Hyo Seo Kim 1 Dongyoon Han 2 Junsuk Choe 1 ... 1Sogang University 2NAVER AI Lab. Correspondence to: Junsuk Choe <EMAIL>, Dongyoon Han <EMAIL>.
Pseudocode	No	The paper describes the method in Section 3.2 'The Proposed Method: Neg Merge' using detailed textual descriptions for each step (Step 1) Calculating Diverse Task Vectors, Step 2) Identifying Task Vector Elements for Forget Set, Step 3) Final Task Vector for Negation), but does not present it as structured pseudocode or an algorithm block.
Open Source Code	Yes	Code is available at https://github.com/naver-ai/negmerge.
Open Datasets	Yes	In the CLIP scenario, we follow the training and evaluation protocols from the Task Arithmetic paper (Ilharco et al., 2022a). We assess unlearning performance on eight datasets: SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Yuval, 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), and DTD (Cimpoi et al., 2014), while using Image Net (Deng et al., 2009) as the retain set to evaluate retaining performance. ... In the standard classifier scenario, we evaluate unlearning performance on CIFAR-10 (Krizhevsky et al., 2009), CUB200-2011 (Wah et al., 2011), and Tiny Image Net (Le & Yang, 2015)...
Dataset Splits	Yes	Table 2 presents a comparison of various unlearning techniques on CIFAR-10 using Res Net-18. In this task, we randomly select 10% of the training set as the forget set. ... We use the accuracies of the retain set Dr, forget set Df, and test set Dtest to evaluate performance.
Hardware Specification	No	Most experiments were conducted on the NAVER Smart Machine Learning (NSML) platform (Sung et al., 2017). This mentions a platform but does not specify the underlying hardware like GPU or CPU models.
Software Dependencies	No	The paper mentions optimizers (Adam W) and models (Res Net-18, VGG-16, CLIP Vi T-{B/32, B/16, L/14}) but does not provide specific version numbers for software dependencies or libraries used.
Experiment Setup	Yes	In the CLIP scenario, for fine-tuning, we set the batch size to 128 and use a learning rate of 1e-5 with a cosine annealing schedule. We utilize the Adam W optimizer, applying a weight decay of 0.1. ... In the standard image classifier unlearning scenario, we fine-tune models with varied hyperparameters. For CIFAR-10, we use Res Net-18 with a batch size of 256 and a learning rate of 0.05, and VGG-16 with a batch size of 64 and a learning rate of 0.01. Instead of data augmentation, we adjust training settings, setting the number of epochs to 40, 50, or 60, weight decay to 1e-4, 5e-5, and 1e-5, and label smoothing to 0, 0.05, or 0.1, resulting in 27 fine-tuned models.