Training Set Debugging Using Trusted Items

Authors: Xuezhou Zhang, Xiaojin Zhu, Stephen Wright

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appropriate changes to the labels.
Researcher Affiliation Academia Xuezhou Zhang and Xiaojin Zhu and Stephen Wright {zhangxz1123, jerryzhu, swright}@cs.wisc.edu Department of Computer Sciences, University of Wisconsin-Madison
Pseudocode Yes Algorithm 1: DUTI
Open Source Code Yes All code and data are published at http://pages.cs.wisc.edu/~jerryzhu/DUTI.
Open Datasets Yes We study the UCI German Loan data set, which has been used in recent work on algorithmic fairness (Zemel et al. 2013; Feldman et al. 2015). Another dataset often used in algorithmic fairness is UCI Adult Income (Kohavi 1996; Kamishima, Akaho, and Sakuma 2011). In this section, we evaluate the debugging methods on a 10-class handwritten digit recognition problem (Mathworks 2017).
Dataset Splits Yes In all our experiments, the learner s hyperparameters are set by 10-fold cross validation on the original training data, and confidence levels on all trusted items c are set to 100.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup No The paper states that 'the learner s hyperparameters are set by 10-fold cross validation', but it does not provide concrete hyperparameter values (e.g., learning rate, batch size, specific kernel parameters) or detailed training configurations in the main text.