reproducibilityindex.ai

Learning Where to Edit Vision Transformers

Authors: Yunqiao Yang, Long-Kai Huang, Shengzhuang Chen, Kede Ma, Ying Wei

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach not only achieves superior performance on the proposed benchmark but also allows for adjustable trade-offs between generalization and locality. To validate our method, we construct an editing benchmark that introduces subpopulation shifts towards natural underrepresented images and AI-generated images, thereby revealing the limitations of pre-trained Vi Ts for object recognition. Our approach not only achieves superior performance on the proposed benchmark but also allows for adjustable trade-offs between generalization and locality.
Researcher Affiliation	Collaboration	Yunqiao Yang1 Long-Kai Huang2 Shengzhuang Chen1 Kede Ma1 Ying Wei3 1City University of Hong Kong 2Tencent AI Lab 3Zhejiang University
Pseudocode	Yes	Algorithm 1 presents the pseudo-code of our method.
Open Source Code	Yes	Our code is available at https://github.com/hustyyq/Where-to-Edit.
Open Datasets	Yes	To build the natural image subset, we first compile a large dataset of unlabeled images, denoted as U, from Flickr, by leveraging keywords relevant to the object categories in Image Net-1k [10]. We adopt Textural Inversion [56] and PUG [5] to construct the AI-generated image subset, encompassing the oil painting and stage light shifts, respectively.
Dataset Splits	Yes	Using the validation set from Image Net-1k as Dl does not adequately examine locality, as the majority are easy samples that lie far from the decision boundary [16]. To more closely examine the adverse effects of model editing, we have carefully curated 2, 071 images near the decision boundary of the base model from the validation sets of Image Net-1k [47], Image Net-R [25], and Image Net-Sketch [57], whose predictions are more susceptible to change.
Hardware Specification	Yes	Training a hypernetwork for the base Vi T/B-16 takes approximately 9 hours on a single RTX A6000 GPU (48G).
Software Dependencies	No	The paper mentions optimizers like Adam and RMSProp and references their theoretical basis, but does not provide specific version numbers for these or other key software components or libraries used in the implementation.
Experiment Setup	Yes	We set the learning rate in the inner loop as 0.001, and perform gradient descent for five steps (i.e., T = 5). In the outer loop, we apply the Adam optimizer with a learning rate of 0.1 to optimize m from random initialization for a total of ten steps. For the hypernetwork optimization, RMSProp5 is utilized with a learning rate of 10 4, a minibatch size of eight, and a maximum iteration number of 7, 000.