Restoring balance: principled under/oversampling of data for optimal classification

Authors: Emanuele Loffredo, Mauro Pastore, Simona Cocco, Remi Monasson

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through numerical experiments, we show the relevance of our theoretical predictions on real datasets, on deeper architectures and with sampling strategies based on unsupervised probabilistic models.
Researcher Affiliation Academia 1Laboratoire de physique de l École normale supérieure, CNRS-UMR8023, PSL University, Sorbonne University, Université Paris-Cité 24 rue Lhomond, 75005 Paris, France.
Pseudocode No The paper describes methods in prose but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes The code to reproduce the theoretical curves can be found on github.
Open Datasets Yes We validated our predictions on (i) the Parity MNIST (p MNIST) dataset...; (ii) Fashion MNIST (FMNIST) with classes containing "Pullover" and "Shirt" images; and (iii) Celeb A with classes containing faces with "Straight hair" and "Wavy hair".
Dataset Splits No The paper mentions 'test set is balanced and has size 1000' for several datasets, but does not provide specific training/validation split percentages or sample counts for reproduction, nor does it cite predefined splits for these datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning 'deeper architectures' generally.
Software Dependencies No The paper mentions 'Scipy library (Jones et al., 2001)' and 'Scikit-learn package (Pedregosa et al., 2011)' but does not specify their version numbers.
Experiment Setup Yes We use RMSprop optimizer with a learning rate of 10 5 and decay of 10 5 and train for 100 epochs with a batch-size of 128.