Sample Selection via Contrastive Fragmentation for Noisy Label Regression

Authors: Chris Dongjoo Kim, Sangwoo Moon, Jihwan Moon, Dongyeon Woo, Gunhee Kim

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively perform experiments on six newly curated benchmark datasets of diverse domains, including age prediction, price prediction, and music production year estimation. We also introduce a metric called Error Residual Ratio (ERR) to better account for varying degrees of label noise. Our approach consistently outperforms fourteen state-of-the-art baselines, being robust against symmetric and random Gaussian label noise.
Researcher Affiliation Collaboration 1Seoul National University, 2LG AI Research {cdjkim, sangwoo.moon, jihwan.moon, dongyeon.woo}@vision.snu.ac.kr gunhee@snu.ac.kr
Pseudocode Yes Algorithm 1 in Appendix summarizes the overall procedure.
Open Source Code Yes 2The code is available at https://github.com/cdjkim/Con Frag
Open Datasets Yes We create six benchmark datasets for noisy labeled regression to encompass a sufficient quantity of balanced data, span multiple domains, and present a meaningful level of complexity. (i) Age Prediction from an image is a well-studied regression problem [Li et al., 2019, Shin et al., 2022, Lim et al., 2020]. To address this domain, we acquire four datasets of AFAD [Niu et al., 2016], IMDB-Clean [Yiming et al., 2021], IMDB-WIKI [Rothe et al., 2018], and UTKFace [Zhifei et al., 2017]. (ii) Commodity Price Prediction... We opt for the SHIFT15M dataset [Kimura et al., 2021]. (iii) Music Production Year Estimation uses the tabular MSD dataset [Bertin-Mahieux et al., 2011].
Dataset Splits Yes Table 7: Dataset Statistics on the six newly curated balanced datasets for regression: AFAD-B [Niu et al., 2016], IMDB-Clean-B [Yiming et al., 2021], IMDB-WIKI-B [Rothe et al., 2018], UTKFace-B [Zhifei et al., 2017], SHIFT15M-B [Kimura et al., 2021], MSD-B [Bertin-Mahieux et al., 2011]. ... AFAD-B [15, 40] 27647 1627 3252 32526 ... IMDB-Clean-B [15, 66] 44200 2600 5200 52000 ...
Hardware Specification Yes All experiments are conducted using NVIDIA Quadro 6000 24GB RAM GPUs.
Software Dependencies Yes For implementation, we use Python 3.9 and Py Torch 1.13.1.
Experiment Setup Yes Con Frag employs the Cosine Annealing Learning rate [Loshchilov and Hutter, 2017] with a minimum learning rate of ηmin = 0. The optimization is carried out using the Adam optimizer [Kingma and Ba, 2015]. For the K-nearest neighbors-based prediction, we experiment with various values of K, specifically choosing from the set [3, 5, 7]. The number of fragments, denoted as F, remains constant at four throughout all the experiments. To determine the buffer range for jittering, we conduct a search over values within the range [0, 0.05, 0.1]. ... Age prediction task datasets, IMDB-Clean-B [Yiming et al., 2021], AFAD-B [Niu et al., 2016], IMDB-WIKI-B [Rothe et al., 2018], and UTKFace-B [Zhifei et al., 2017], train for 120 epochs with a learning rate of 0.001.