Training Set Debugging Using Trusted Items
Authors: Xuezhou Zhang, Xiaojin Zhu, Stephen Wright
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on toy and real data demonstrate that our approach can identify training set bugs effectively and suggest appropriate changes to the labels. |
| Researcher Affiliation | Academia | Xuezhou Zhang and Xiaojin Zhu and Stephen Wright {zhangxz1123, jerryzhu, swright}@cs.wisc.edu Department of Computer Sciences, University of Wisconsin-Madison |
| Pseudocode | Yes | Algorithm 1: DUTI |
| Open Source Code | Yes | All code and data are published at http://pages.cs.wisc.edu/~jerryzhu/DUTI. |
| Open Datasets | Yes | We study the UCI German Loan data set, which has been used in recent work on algorithmic fairness (Zemel et al. 2013; Feldman et al. 2015). Another dataset often used in algorithmic fairness is UCI Adult Income (Kohavi 1996; Kamishima, Akaho, and Sakuma 2011). In this section, we evaluate the debugging methods on a 10-class handwritten digit recognition problem (Mathworks 2017). |
| Dataset Splits | Yes | In all our experiments, the learner s hyperparameters are set by 10-fold cross validation on the original training data, and confidence levels on all trusted items c are set to 100. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers). |
| Experiment Setup | No | The paper states that 'the learner s hyperparameters are set by 10-fold cross validation', but it does not provide concrete hyperparameter values (e.g., learning rate, batch size, specific kernel parameters) or detailed training configurations in the main text. |