Dropout Enhanced Bilevel Training

Authors: Peiran Yu, Junyi Li, Heng Huang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that overfitting occurs in data cleaning and meta-learning, and the method proposed in this work mitigates this issue.
Researcher Affiliation Academia Peiran Yu Department of Computer Science University of Maryland College Park, MD 20740, USA {pyu123}@umd.edu Junyi Li Department of Computer Science University of Maryland College Park, MD 20740, USA {junyili.ai}@gmail.com Heng Huang Department of Computer Science University of Maryland College Park, MD 20740, USA {henghuanghh}@gmail.com
Pseudocode Yes Algorithm 1 FSLA with dropout
Open Source Code No The paper does not provide concrete access to source code for the methodology described in this paper. There is no mention of code release, repository links, or code in supplementary materials.
Open Datasets Yes The experiments were performed using the MNIST dataset (Le Cun et al., 2010). When training on MNIST and FMNIST, we use a fully connected network... When training on CIFAR10... We conduct experiments with the few-shot learning task, following the experimental protocols of (Vinyals et al., 2016), we performed learning tasks over the Omniglot dataset.
Dataset Splits Yes We set train/validation/test with 102/172/423, respectively. We perform 5-way-1-shot classification. More specifically, we perform 5 training tasks (N = 5). For each task, we randomly sample 5 characters from the alphabet over that client and for each character, and select 1 data points for training and 15 samples for validation.
Hardware Specification Yes The experiments were conducted on a machine equipped with an Intel Xeon E5-2683 CPU and 4 Nvidia Tesla P40 GPUs.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., programming language versions like Python 3.x, or library versions like PyTorch 1.x).
Experiment Setup Yes The network architecture employed was 784-1024-1024-2048-10, with Re LU activation functions used for all hidden units. In all experiments conducted on MNIST and FMNIST, we set the dropout rates of all layers to the same value, denoted as p. We set γ = 0.01 and train 10000/10000 iterations for 5000/10000 MNIST/FMNIST data points. In all experiments, we let β, α and γ in FSLA and Algorithm 1 be 0.05, 0.1 and 0.8 respectively.