Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Approximate Domain Unlearning for Vision-Language Models

Authors: Kodai Kawamura, Yuta Goto, Rintaro Yanagi, Hirokatsu Kataoka, Go Irie

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on four multi-domain benchmark datasets demonstrate that our approach significantly outperforms strong baselines built upon state-of-the-art VLM tuning techniques, paving the way for practical and fine-grained unlearning in VLMs. The paper includes a dedicated '4 Experiments' section detailing the empirical evaluation.
Researcher Affiliation	Academia	The authors are affiliated with Tokyo University of Science, National University of Singapore, National Institute of Advanced Industrial Science and Technology (AIST), and University of Oxford, all of which are academic or public research institutions.
Pseudocode	No	The paper describes the proposed method, Domain Disentangling Loss (DDL) and Instance-wise Prompt Generator (Insta PG), in Section 3.3 and 3.4, respectively, using descriptive text and formulas, but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code : https://kodaikawamura.github.io/Domain_Unlearning/.
Open Datasets	Yes	Datasets. We evaluate our method on four public multi-domain image classification datasets: Image Net [Deng et al., 2009], Office-Home [Venkateswara et al., 2017], Mini Domain Net [Zhou et al., 2021], and Domain Net [Peng et al., 2019]
Dataset Splits	Yes	Unless otherwise noted, we use eight labeled samples per domain (both class and domain labels) for training, following the few-shot setting commonly adopted in recent VLM tuning [Zhou et al., 2022b, Khattak et al., 2023, Li et al., 2024, Huang et al., 2024a] and machine unlearning studies [Kuwana et al., 2024].
Hardware Specification	Yes	Table 12 summarizes GPU memory usage and training time with an NVIDIA RTX A4000 GPU on Office-Home.
Software Dependencies	No	The paper mentions using a pre-trained CLIP model and a Vi T-B/16 image encoder, but does not provide specific software dependencies with version numbers (e.g., PyTorch version, Python version, CUDA version) needed for replication.
Experiment Setup	Yes	Implementation Details. We use a pre-trained CLIP model with Vi T-B/16 [Dosovitskiy et al., 2021] as the image encoder. The text prompt is set to a photo of a [class] . For vision prompts, we adopt deep prompting [Khattak et al., 2023] with eight learnable context tokens and train the model for 50 epochs using SGD with a learning rate of 0.0025. The vision prompts are optimized within the first nine transformer layers of the image encoder. We consistently set the weights in loss functions γ = 30 and λ = 10.