reproducibilityindex.ai

Latent Paraphrasing: Perturbation on Layers Improves Knowledge Injection in Language Models

Authors: Minki Kang, Sung Ju Hwang, Gibbeum Lee, Jaewoong Cho

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our extensive experiments on question-answering benchmarks demonstrate that La Pael improves knowledge injection over standard fine-tuning and existing noise-based approaches.
Researcher Affiliation	Collaboration	Minki Kang1,2 Sung Ju Hwang2 Gibbeum Lee1 Jaewoong Cho1 1KRAFTON, 2KAIST
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	We do not open-source the code yet. However, we will open-source it if the paper is accepted.
Open Datasets	Yes	We mainly use the test split of three QA datasets: SQu AD [38], Streaming QA [27], and Archival QA [51] for the source of DK and DQA in our main experiments.
Dataset Splits	Yes	Table 11: Dataset statistics. We report the size of Dtrain, DK, and DQA used in our experiments. For SQu AD, Dtrain is 1,000, DK is 1,000, and DQA is 1,000.
Hardware Specification	Yes	We use 4 A100 GPUs for fine-tuning LLMs.
Software Dependencies	Yes	We mainly use Vicuna-7b-v1.5 [56] for fine-tuning, which is the instructiontuned version of Llama-2-7b [48] for our experiments. We also verify with Mistral-7B-Instruct-v0.2 [18], and Phi3-mini-4k-instruct [1].
Experiment Setup	Yes	We fine-tune LLMs for 12 epochs with a learning rate of 0.00005 and step learning rate scheduler where we decay a learning rate by 0.85 by every 4 epochs. For optimizer, we use Adam W [28]. ... We use 5 latent paraphrasers on the 5 sequential early layers of LLMs. For Equation (13), we use N = 4. For Equation (14), we use K = 10. For Equation (15), we set r = 0.5.