Sample Selection via Contrastive Fragmentation for Noisy Label Regression
Authors: Chris Dongjoo Kim, Sangwoo Moon, Jihwan Moon, Dongyeon Woo, Gunhee Kim
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively perform experiments on six newly curated benchmark datasets of diverse domains, including age prediction, price prediction, and music production year estimation. We also introduce a metric called Error Residual Ratio (ERR) to better account for varying degrees of label noise. Our approach consistently outperforms fourteen state-of-the-art baselines, being robust against symmetric and random Gaussian label noise. |
| Researcher Affiliation | Collaboration | 1Seoul National University, 2LG AI Research {cdjkim, sangwoo.moon, jihwan.moon, dongyeon.woo}@vision.snu.ac.kr gunhee@snu.ac.kr |
| Pseudocode | Yes | Algorithm 1 in Appendix summarizes the overall procedure. |
| Open Source Code | Yes | 2The code is available at https://github.com/cdjkim/Con Frag |
| Open Datasets | Yes | We create six benchmark datasets for noisy labeled regression to encompass a sufficient quantity of balanced data, span multiple domains, and present a meaningful level of complexity. (i) Age Prediction from an image is a well-studied regression problem [Li et al., 2019, Shin et al., 2022, Lim et al., 2020]. To address this domain, we acquire four datasets of AFAD [Niu et al., 2016], IMDB-Clean [Yiming et al., 2021], IMDB-WIKI [Rothe et al., 2018], and UTKFace [Zhifei et al., 2017]. (ii) Commodity Price Prediction... We opt for the SHIFT15M dataset [Kimura et al., 2021]. (iii) Music Production Year Estimation uses the tabular MSD dataset [Bertin-Mahieux et al., 2011]. |
| Dataset Splits | Yes | Table 7: Dataset Statistics on the six newly curated balanced datasets for regression: AFAD-B [Niu et al., 2016], IMDB-Clean-B [Yiming et al., 2021], IMDB-WIKI-B [Rothe et al., 2018], UTKFace-B [Zhifei et al., 2017], SHIFT15M-B [Kimura et al., 2021], MSD-B [Bertin-Mahieux et al., 2011]. ... AFAD-B [15, 40] 27647 1627 3252 32526 ... IMDB-Clean-B [15, 66] 44200 2600 5200 52000 ... |
| Hardware Specification | Yes | All experiments are conducted using NVIDIA Quadro 6000 24GB RAM GPUs. |
| Software Dependencies | Yes | For implementation, we use Python 3.9 and Py Torch 1.13.1. |
| Experiment Setup | Yes | Con Frag employs the Cosine Annealing Learning rate [Loshchilov and Hutter, 2017] with a minimum learning rate of ηmin = 0. The optimization is carried out using the Adam optimizer [Kingma and Ba, 2015]. For the K-nearest neighbors-based prediction, we experiment with various values of K, specifically choosing from the set [3, 5, 7]. The number of fragments, denoted as F, remains constant at four throughout all the experiments. To determine the buffer range for jittering, we conduct a search over values within the range [0, 0.05, 0.1]. ... Age prediction task datasets, IMDB-Clean-B [Yiming et al., 2021], AFAD-B [Niu et al., 2016], IMDB-WIKI-B [Rothe et al., 2018], and UTKFace-B [Zhifei et al., 2017], train for 120 epochs with a learning rate of 0.001. |