reproducibilityindex.ai

Temporal Robustness against Data poisoning

Authors: Wenxiao Wang, Soheil Feizi

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a benchmark with an evaluation protocol simulating continuous data collection and periodic deployments of updated models, thus enabling empirical evaluation of temporal robustness. Lastly, we develop and also empirically verify a baseline defense, namely temporal aggregation, offering provable temporal robustness and highlighting the potential of our temporal threat model for data poisoning.
Researcher Affiliation	Academia	Wenxiao Wang Department of Computer Science University of Maryland College Park, MD 20742 wwx@umd.edu Soheil Feizi Department of Computer Science University of Maryland College Park, MD 20742 sfeizi@umd.edu
Pseudocode	No	The paper defines "Temporal Aggregation" with a mathematical formula in Definition 4.1, but it does not present it as a structured pseudocode block or algorithm.
Open Source Code	No	The paper does not contain an explicit statement about the release of its source code or a link to a code repository for the methodology described.
Open Datasets	Yes	We use News Category Dataset [20, 21] as the base of our benchmark, which contains news headlines from 2012 to 2022 published on Huff Post (https://www.huffpost.com).Rishabh Misra. News category dataset. Co RR, abs/2209.11429, 2022. doi: 10.48550/ARXIV.2209.11429. URL https://doi.org/10.48550/ar Xiv.2209.11429.
Dataset Splits	No	The paper describes a temporal split for training and testing data, where models are trained on all previous months' data and tested on the current month's data ("when predicting the categories of samples from the i-th month, the model to be used will be the one trained from only samples of previous months (i.e. from the 0-th to the (i 1)-th month)"). However, it does not specify a separate validation dataset split.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU/CPU models or memory specifications.
Software Dependencies	No	The paper mentions using "pre-trained Ro BERTa [17]", "Adam W optimizer [18]", and "Py Torch [24]" but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We optimize a linear classification head over normalized Ro BERTa features with Adam W optimizer [18] for 50 epochs, using a learning rate of 1e-3 and a batch size of 256. To make the base learner deterministic, we set the random seeds explicitly and disable all nondeterministic features from Py Torch [24].