Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Minimal Targeted Updates of Language Models with Targeted Negative Training

Authors: Lily H Zhang, Rajesh Ranganath, Arya Tafvizi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We consider two use cases for targeted negative training, reducing hallucinations and toxicity. All experiments utilize T5 base (220M parameters). First, we finetune T5 on the original training set. Then, we generate from the model given training and validation inputs and annotate the generations. Next, we use the annotated generations to update the model. To evaluate, we compute the prevalence of the unwanted behavior among the new model s generations on the test inputs, as well as similarity between the old and new model s generations.
Researcher Affiliation	Collaboration	Lily H. Zhang EMAIL New York University Rajesh Ranganath EMAIL New York University Arya Tafvizi EMAIL Google
Pseudocode	Yes	Algorithm 1 Targeted Negative Training 1: Input: initial model po (already trained), inputs {c}n 1, model outputs {x}n 1, token annotations {a}n 1 denoting xt supp(pneg c,x<t) 2: pm po 3: for each iteration do 4: Get pm c,x<t for all c, x<t in batch (forward pass of pm) 5: Get po c,x<t for all c, x<t in batch (forward pass of po) 6: Compute pnew c,x<t for all c, x<t in batch (Equation (2)) 7: Calculate tnt loss (Equation (4)) 8: Calculate gradients for weights in pm and update pm 9: end for 10: Return pm
Open Source Code	Yes	Code for tnt can be found at https://github.com/google/t5patches.
Open Datasets	Yes	We use the XSUM dataset (Narayan et al., 2018) for the reducing hallucination task and Civil Comments (Borkan et al., 2019) for the reducing offensive phrases task. ... To label text spans as toxic, we train a token-level toxicity classifier on the Civil Comments Spans dataset Pavlopoulos et al. (2021).
Dataset Splits	Yes	For the hallucination experiment, we use the XSUM train, validation, and test splits. The dataset sizes for train, validation, and test are 203,577, 11,305, and 11,301. ... The resulting train, validation, and test (unused) sets are of size 175,754, 21,974, and 22,009.
Hardware Specification	Yes	For all experiments, we use Google Cloud v4 TPU pods.
Software Dependencies	No	The paper mentions using "Spacy's CNN-based named entity recognition (NER) model" but does not provide a specific version number. No other specific software dependencies with version numbers are mentioned.
Experiment Setup	Yes	For all runs, we use a batch size of 32, dropout rate of 0.1, and no label smoothing. For all runs, the cross entropy loss includes the square of the logsumexp of the logits as a penalty, scaled by a factor of 0.0001. ... For the initial finetuning, we train a base T5 model with learning rate 1e-3 and select the best checkpoint every 10,000 steps based on validation loss. Our resulting models are finetuned for 30,000 steps on XSUM and 40,000 steps on Civil Comments. For the updates and alternative finetuning, we run a sweep across four different learning rates (1e-3, 1e-4, 1e-5, 1e-6) and choose the best model per every 1,000 steps based on validation loss. We run updates for a total of 100,000 steps for the T5 model, and 200,000 steps for the the Pa LM-2 1b model. The learning rates used for the various methods are as follows: (Table 2 provides specific learning rates for each method).