reproducibilityindex.ai

Transformer-Patcher: One Mistake Worth One Neuron

Authors: Zeyu Huang, Yikang Shen, Xiaofeng Zhang, Jie Zhou, Wenge Rong, Zhang Xiong

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on both classification and generation tasks show that Transformer-Patcher can successively correct up to thousands of errors (Reliability) and generalize to their equivalent inputs (Generality) while retaining the model s accuracy on irrelevant inputs (Locality).
Researcher Affiliation	Collaboration	Zeyu Huang1,2, Yikang Shen4, Xiaofeng Zhang1,2, Jie Zhou5, Wenge Rong1,3, Zhang Xiong1,3 1State Key Laboratory of Software Development Environment, Beihang University, China 2Sino-French Engineer School, Beihang University, China 3School of Computer Science and Engineering, Beihang University, China 4Mila, University of Montreal, Canada, 5We Chat AI, Tencent Inc, China
Pseudocode	No	Appendix A describes the "Multiple Neuron Patching" principle using equations and textual explanations, but it does not provide a structured pseudocode block or algorithm.
Open Source Code	Yes	The code is available at https://github.com/Zero Yu Huang/Transform er-Patcher.
Open Datasets	Yes	For FC, we apply a BERT base model (Devlin et al., 2019) and the FEVER dataset (Thorne et al., 2018). For QA, we apply a BART base model (Lewis et al., 2020) and the Zero-Shot Relation Extraction (zs RE) dataset (Levy et al., 2017). We directly use the equivalent set released by Cao et al. (2021).
Dataset Splits	Yes	We first split the original Dtrain into an edit set Dedit and a new training set D train. ... For closed-book fact-checking, ... split the original training data into three subsets: a new training set D train, a new validation set Dval and an edit set Dedit in the ratio of 0.8 : 0.1 : 0.1. ... For closed-book question answering, ... employ the same data split process as FEVER in the ratio of 0.9 : 0.075 : 0.025.
Hardware Specification	Yes	Using a V100, one edit costs only 7.1s for FC and 18.9s for QA. ... we run SME experiment n=20 times on n different edit folders simultaneously using 8 NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The paper mentions "Adam optimizer (Kingma & Ba, 2015) is applied for both tasks." but does not provide specific version numbers for software libraries, programming languages (e.g., Python, PyTorch, TensorFlow), or other dependencies.
Experiment Setup	Yes	The initial learning rate is set as 0.01. Adam optimizer (Kingma & Ba, 2015) is applied for both tasks. Every patch is initialized with the normalized corresponding query qe \|qe\|2. ... The parameter ka mentioned in equation 30 is set as 5, and parameter k for memory loss is set as 1000.