reproducibilityindex.ai

DeRDaVa: Deletion-Robust Data Valuation for Machine Learning

Authors: Xiao Tian, Rachael Hwee Ling Sim, Jue Fan , Bryan Kian Hsiang Low

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also empirically demonstrate the practicality of our solutions.
Researcher Affiliation	Academia	Xiao Tian1,2, Rachael Hwee Ling Sim1, Jue Fan1,2, Bryan Kian Hsiang Low1 1 Department of Computer Science, National University of Singapore 2 Department of Mathematics, National University of Singapore {xiao.tian, rachael.sim, jue.fan}@u.nus.edu, lowkh@comp.nus.edu.sg
Pseudocode	Yes	The justification and pseudocode for 012-MCMC algorithm are included in App. D.2.
Open Source Code	No	The paper does not include any statement or link providing access to the open-source code for the methodology described.
Open Datasets	Yes	Our experiments use the following [model-dataset] combinations: [NB-CC] Naive Bayes trained on Credit Card (Yeh and Lien 2009), [NB-Db] Naive Bayes trained on Diabetes (Carrion, Dustin 2022), [NB-Wd] Naive Bayes trained on Wind (Vanschoren, Joaquin 2014), [SVM-Db] Support Vector Machine trained on Diabetes, and [LR-Pm] Logistic Regression trained on Phoneme (Grin, Leo 2022).
Dataset Splits	No	While the paper mentions "validation accuracy" in a general definition, it does not specify the explicit training/validation/test splits used for its own experiments, such as percentages or sample counts for a validation set.
Hardware Specification	Yes	The experiments are performed on a 64-bit Linux server with 256GB RAM and two Intel Xeon E5-2690 CPUs.
Software Dependencies	Yes	We implemented our solutions using Python 3.9.7 with scikit-learn 1.0.2.
Experiment Setup	Yes	For all experiments, we used Adam optimizer with learning rate 0.001 and batch size 64. The model training terminates when the validation loss does not improve for 10 epochs or after a maximum of 100 epochs.