A Theoretical Analysis of the Learning Dynamics under Class Imbalance

Authors: Emanuele Francazi, Marco Baity-Jesi, Aurelien Lucchi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We start our investigation from the following two empirical observations: (i) The learning dynamics is delayed for imbalanced problems... (ii) While the overall performance improves during the dynamics, that of the minority classes quickly deteriorates at first. This is shown in Fig. 1 (blue curves) for a binary unbalanced classification problem. We call this initial deterioration the minority initial drop (MID)... In Sec. 4.4, we validate our conclusions through experiments.
Researcher Affiliation Academia 1Physics Department, EPFL, Switzerland 2SIAM Department, Eawag (ETH), Switzerland 3Department of Mathematics and Computer Science, University of Basel, Switzerland.
Pseudocode Yes F. Algorithms In this section we give a summary presentation (with pseudo-codes) of the various variants of (S)GD introduced in the study. Algorithm 1 PCNGD
Open Source Code Yes We consolidate the findings of our paper with experiments, and provide our code on Git Hub (see also App. G).
Open Datasets Yes The runs in this work were performed using data from the CIFAR10 and CIFAR100 datasets (Krizhevsky et al., 2009).
Dataset Splits Yes For each of the datasets described above, the validation set was constructed similarly to the test set (same criteria for the composition and same size) but using a different subset of images.
Hardware Specification Yes The models of the GPUs mounted on the servers used are: Ge Force RTX 2080 Ti GP104GL Quadro P4000 GM 200 Ge Force GTX Titan X
Software Dependencies No The paper mentions "Python" in the context of the GitHub repository (Appendix G), but does not specify version numbers for Python or any specific libraries (e.g., PyTorch, TensorFlow) used for the experiments.
Experiment Setup Yes Before starting the network training, learning rate (LR) and batch size (BS)8 values need to be fixed through hyperparameter tuning. The HP tuning involved the optimization of batch size and learning rate, by exhaustive grid search. The optimal hyperparameters were chosen based on the macro-averaged recall. Compared to Mod1 we thus have in Mod3 two additional HPs to be fixed through the HP validation process (DO rate and GF).