Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
NegMerge: Sign-Consensual Weight Merging for Machine Unlearning
Authors: Hyo Seo Kim, Dongyoon Han, Junsuk Choe
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluations on zero-shot and standard image recognition tasks across twelve datasets and four backbone architectures show that our approach outperforms state-of-the-art methods while requiring similar or fewer computational resources. |
| Researcher Affiliation | Collaboration | Hyo Seo Kim 1 Dongyoon Han 2 Junsuk Choe 1 ... 1Sogang University 2NAVER AI Lab. Correspondence to: Junsuk Choe <EMAIL>, Dongyoon Han <EMAIL>. |
| Pseudocode | No | The paper describes the method in Section 3.2 'The Proposed Method: Neg Merge' using detailed textual descriptions for each step (Step 1) Calculating Diverse Task Vectors, Step 2) Identifying Task Vector Elements for Forget Set, Step 3) Final Task Vector for Negation), but does not present it as structured pseudocode or an algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/naver-ai/negmerge. |
| Open Datasets | Yes | In the CLIP scenario, we follow the training and evaluation protocols from the Task Arithmetic paper (Ilharco et al., 2022a). We assess unlearning performance on eight datasets: SUN397 (Xiao et al., 2016), Cars (Krause et al., 2013), RESISC45 (Cheng et al., 2017), Euro SAT (Helber et al., 2019), SVHN (Yuval, 2011), GTSRB (Stallkamp et al., 2011), MNIST (Le Cun, 1998), and DTD (Cimpoi et al., 2014), while using Image Net (Deng et al., 2009) as the retain set to evaluate retaining performance. ... In the standard classifier scenario, we evaluate unlearning performance on CIFAR-10 (Krizhevsky et al., 2009), CUB200-2011 (Wah et al., 2011), and Tiny Image Net (Le & Yang, 2015)... |
| Dataset Splits | Yes | Table 2 presents a comparison of various unlearning techniques on CIFAR-10 using Res Net-18. In this task, we randomly select 10% of the training set as the forget set. ... We use the accuracies of the retain set Dr, forget set Df, and test set Dtest to evaluate performance. |
| Hardware Specification | No | Most experiments were conducted on the NAVER Smart Machine Learning (NSML) platform (Sung et al., 2017). This mentions a platform but does not specify the underlying hardware like GPU or CPU models. |
| Software Dependencies | No | The paper mentions optimizers (Adam W) and models (Res Net-18, VGG-16, CLIP Vi T-{B/32, B/16, L/14}) but does not provide specific version numbers for software dependencies or libraries used. |
| Experiment Setup | Yes | In the CLIP scenario, for fine-tuning, we set the batch size to 128 and use a learning rate of 1e-5 with a cosine annealing schedule. We utilize the Adam W optimizer, applying a weight decay of 0.1. ... In the standard image classifier unlearning scenario, we fine-tune models with varied hyperparameters. For CIFAR-10, we use Res Net-18 with a batch size of 256 and a learning rate of 0.05, and VGG-16 with a batch size of 64 and a learning rate of 0.01. Instead of data augmentation, we adjust training settings, setting the number of epochs to 40, 50, or 60, weight decay to 1e-4, 5e-5, and 1e-5, and label smoothing to 0, 0.05, or 0.1, resulting in 27 fine-tuned models. |