reproducibilityindex.ai

Restoring balance: principled under/oversampling of data for optimal classification

Authors: Emanuele Loffredo, Mauro Pastore, Simona Cocco, Remi Monasson

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through numerical experiments, we show the relevance of our theoretical predictions on real datasets, on deeper architectures and with sampling strategies based on unsupervised probabilistic models.
Researcher Affiliation	Academia	1Laboratoire de physique de l École normale supérieure, CNRS-UMR8023, PSL University, Sorbonne University, Université Paris-Cité 24 rue Lhomond, 75005 Paris, France.
Pseudocode	No	The paper describes methods in prose but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code to reproduce the theoretical curves can be found on github.
Open Datasets	Yes	We validated our predictions on (i) the Parity MNIST (p MNIST) dataset...; (ii) Fashion MNIST (FMNIST) with classes containing "Pullover" and "Shirt" images; and (iii) Celeb A with classes containing faces with "Straight hair" and "Wavy hair".
Dataset Splits	No	The paper mentions 'test set is balanced and has size 1000' for several datasets, but does not provide specific training/validation split percentages or sample counts for reproduction, nor does it cite predefined splits for these datasets.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning 'deeper architectures' generally.
Software Dependencies	No	The paper mentions 'Scipy library (Jones et al., 2001)' and 'Scikit-learn package (Pedregosa et al., 2011)' but does not specify their version numbers.
Experiment Setup	Yes	We use RMSprop optimizer with a learning rate of 10 5 and decay of 10 5 and train for 100 epochs with a batch-size of 128.