Supercharging Imbalanced Data Learning With Energy-based Contrastive Representation Transfer
Authors: Junya Chen, Zidi Xiu, Benjamin Goldstein, Ricardo Henao, Lawrence Carin, Chenyang Tao
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate the utility of our model, we consider a wide range of (semi)-synthetic and real-world tasks experimentally. All experiments are implemented with Py Torch, and our code is available from https://github.com/Zidi Xiu/ECRT. More details of our setup & additional analyses are deferred to the SM Sections C-E. |
| Researcher Affiliation | Academia | 1Duke University 2 KAUST {junya.chen, zidi.xiu, chenyang.tao}@duke.edu |
| Pseudocode | Yes | Algorithm 1 Energy-based Causal Representation Transfer. |
| Open Source Code | Yes | All experiments are implemented with Py Torch, and our code is available from https://github.com/Zidi Xiu/ECRT. |
| Open Datasets | Yes | Real-world datasets. We consider the following semi-synthetic and real datasets: (i) Imbalanced MNIST and CIFAR100: standard image classification tasks with artificially created step-imbalanced following [10]; (ii) Imbalanced Tiny Image Net [2]: a scaled-down version of the classic natural image dataset Image Net, comprised of 200 classes, 500 samples per class and 10k validation images, with different simulated imbalances applied; (iii) i Naturalist 2019 [86]: a challenging task for image classification in the wild comprised of 26k training and 3k validation images labeled with 1k classes, with a skewed distribution in label frequency; (iv) ar Xiv abstracts, imbalanced multi-label prediction of paper categories with 160k samples and 152 classes. All datasets are public. |
| Dataset Splits | Yes | Following the classical evaluation setup for imbalanced data learning, we learn on an imbalanced training set and report performance on a balanced validation set. i Naturalist 2019 [86]: a challenging task for image classification in the wild comprised of 26k training and 3k validation images labeled with 1k classes. We used a random 8/2 split for training and validation. |
| Hardware Specification | No | This work used the Extreme Science and Engineering Discovery Environment (XSEDE) PSC Bridges-2 and SDSC Expanse at the service-provider through allocation TG-ELE200002 and TG-CIS210044. This describes the computing environment but does not specify exact GPU/CPU models or other detailed hardware specifications. |
| Software Dependencies | No | All experiments are implemented with Py Torch. The paper mentions PyTorch and pre-trained models, but does not specify their version numbers or other software dependencies with versioning. |
| Experiment Setup | No | We used a random 8/2 split for training and validation, and applied Adam optimizers for training. We rely on the best out-of-sample cross-entropy and GCL loss for hyperparameter tuning. The paper describes the general approach to tuning and optimizer, but does not provide concrete hyperparameter values such as learning rate or batch size in the main text. |