Dropout Enhanced Bilevel Training
Authors: Peiran Yu, Junyi Li, Heng Huang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that overfitting occurs in data cleaning and meta-learning, and the method proposed in this work mitigates this issue. |
| Researcher Affiliation | Academia | Peiran Yu Department of Computer Science University of Maryland College Park, MD 20740, USA {pyu123}@umd.edu Junyi Li Department of Computer Science University of Maryland College Park, MD 20740, USA {junyili.ai}@gmail.com Heng Huang Department of Computer Science University of Maryland College Park, MD 20740, USA {henghuanghh}@gmail.com |
| Pseudocode | Yes | Algorithm 1 FSLA with dropout |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. There is no mention of code release, repository links, or code in supplementary materials. |
| Open Datasets | Yes | The experiments were performed using the MNIST dataset (Le Cun et al., 2010). When training on MNIST and FMNIST, we use a fully connected network... When training on CIFAR10... We conduct experiments with the few-shot learning task, following the experimental protocols of (Vinyals et al., 2016), we performed learning tasks over the Omniglot dataset. |
| Dataset Splits | Yes | We set train/validation/test with 102/172/423, respectively. We perform 5-way-1-shot classification. More specifically, we perform 5 training tasks (N = 5). For each task, we randomly sample 5 characters from the alphabet over that client and for each character, and select 1 data points for training and 15 samples for validation. |
| Hardware Specification | Yes | The experiments were conducted on a machine equipped with an Intel Xeon E5-2683 CPU and 4 Nvidia Tesla P40 GPUs. |
| Software Dependencies | No | The paper does not provide specific ancillary software details with version numbers (e.g., programming language versions like Python 3.x, or library versions like PyTorch 1.x). |
| Experiment Setup | Yes | The network architecture employed was 784-1024-1024-2048-10, with Re LU activation functions used for all hidden units. In all experiments conducted on MNIST and FMNIST, we set the dropout rates of all layers to the same value, denoted as p. We set γ = 0.01 and train 10000/10000 iterations for 5000/10000 MNIST/FMNIST data points. In all experiments, we let β, α and γ in FSLA and Algorithm 1 be 0.05, 0.1 and 0.8 respectively. |