Rich Feature Construction for the Optimization-Generalization Dilemma

Authors: Jianyu Zhang, David Lopez-Paz, Leon Bottou

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section presents experimental results that illustrate how the rich representations constructed with RFC can help the Oo D performance and reduce the performance variance of Oo D methods.
Researcher Affiliation Collaboration 1New York University, New York, NY, USA. 2Facebook AI Research, Paris, France. 3Facebook AI Research, New York, NY, USA.
Pseudocode Yes Algorithm 1 Robust Empirical Risk Minimization (RERM)
Open Source Code Yes Code for replicating these experiments is publicly available at https://github.com/Tju Jianyu/RFC/.
Open Datasets Yes All experiments reported in this section use the COLOREDMNIST task (Arjovsky et al., 2020)... The CAMELYON17 dataset (Bandi et al., 2018)
Dataset Splits Yes The task also specifies multiple runs with different seeds in order to observe the result variability. Finally the task defines two ways to perform hyper-parameter selection: IID Tune selects hyper-parameters based on model performance on 33,560 images held out from the training data, Oo D Tune selects hyper-parameters on the model performance observed on the fifth hospital (34,904 images).
Hardware Specification No The paper describes network architectures and training settings (e.g., "The network is a Dense Net121 model", "2-hidden-layers MLP network architecture"), but does not specify the particular hardware (CPU, GPU models, etc.) used for running the experiments.
Software Dependencies No The paper mentions optimizers (Adam, SGD) and model architectures (Dense Net121), but it does not provide specific version numbers for any software libraries (e.g., PyTorch, TensorFlow) or programming languages (e.g., Python).
Experiment Setup Yes All experiments use the same 2-hidden-layers MLP network architecture (390 hidden neurons), Adam optimizer, learning rate=0.0005, L2 weights regularization=0.0011 and binary cross-entropy objective function as the COLOREDMNIST benchmark (Arjovsky et al., 2020). The network is a Dense Net121 model (Huang et al., 2017) trained by optimizing a cross-entropy loss with L2 weight decay=0.01 using SGD with learning rate=0.001, momentum=0.9 and batch size=32.