Interactive Label Cleaning with Example-based Explanations

Authors: Stefano Teso, Andrea Bontempelli, Fausto Giunchiglia, Andrea Passerini

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive empirical evaluation shows that clarifying the reasons behind the model s suspicions by cleaning the counter-examples helps in acquiring substantially better data and models, especially when paired with our FIM approximation. We empirically address the following research questions: Q1: Do counter-examples contribute to cleaning the data? Q2: Which influence-based selection strategy identifies the most mislabeled counter-examples? Q3: What contributes to the effectiveness of the best counter-example selection strategy?
Researcher Affiliation Academia Stefano Teso University of Trento Trento, Italy stefano.teso@unitn.it Andrea Bontempelli University of Trento Trento, Italy andrea.bontempelli@unitn.it Fausto Giunchiglia University of Trento Trento, Italy fausto.giunchiglia@unitn.it Andrea Passerini University of Trento Trento, Italy andrea.passerini@unitn.it
Pseudocode Yes The pseudo-code of CINCER is listed in Algorithm 1.
Open Source Code Yes The code for all experiments is available at: https://github.com/abonte/cincer.
Open Datasets Yes Data sets. We used a diverse set of classification data sets: Adult [27]: data set of 48,800 persons... Breast [27]: data set of 569 patients... 20NG [27]: data set of newsgroup posts... MNIST [29]: handwritten digit recognition data set... Fashion [30]: fashion article classification dataset...
Dataset Splits Yes For adult and breast, a random 80 : 20 training-test split is used while for MNIST, fashion and 20 NG the split provided with the data set is used.
Hardware Specification Yes All experiments were run on a 12-core machine with 16 Gi B of RAM and no GPU.
Software Dependencies No We implemented CINCER using Python and Tensorflow [25] on top of three classifiers and compared different counter-example selection strategies on five data sets.
Experiment Setup Yes Upon receiving a new example, the classifier is retrained from scratch for 100 epochs using Adam [31] with default parameters, with early stopping when the accuracy on the training set reaches 90% for FC and CNN, and 70% for LR. The margin threshold is set to τ = 0.2.