Enabling Few-Shot Learning with PID Control: A Layer Adaptive Optimizer
Authors: Le Yu, Xinde Li, Pengfei Zhang, Zhentong Zhang, Fir Dunkin
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | A series of experiments conducted on four standard benchmark datasets demonstrate the efficacy of the LA-PID optimizer, indicating that LA-PID achieves state-of-the-art performance in few-shot classification and cross-domain tasks, accomplishing these objectives with fewer training steps. |
| Researcher Affiliation | Academia | 1School of Automation, Southeast University, Nanjing, China. 2Nanjing Center for Applied Mathematics, Nanjing, China. 3Key Laboratory of Measurement and Control of Complex Systems of Engineering, Ministry of Education, Southeast University, Nanjing, China. 4Southeast University Shenzhen Research Institute, Shenzhen, China. |
| Pseudocode | Yes | Algorithm 1 Layer-Adaptive PID (LA-PID) Learning |
| Open Source Code | Yes | Code is available on https://github.com/yuguopin/LA-PID. |
| Open Datasets | Yes | mini-Image Net (Vinyals et al., 2016) consists of 100 classes with 60,000 RGB images of size 84x84. ... tiered-Image Net (Ren et al., 2018) is composed of 608 classes... CIFAR-FS (Bertinetto et al., 2018) includes a total of 100 classes... FC100 (Oreshkin et al., 2018) is composed of 100 classes with 60,000 images. |
| Dataset Splits | Yes | mini-Image Net consists of 100 classes with 60,000 RGB images of size 84x84. The dataset is partitioned into three non-overlapping subsets: 64 classes for training, 16 for validation, and 20 for testing. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used, such as GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper does not provide specific version numbers for any software components or libraries used in the experiments. |
| Experiment Setup | Yes | The model is trained for 30 epochs and each epoch with 500 iterations, we set the batch size of 2 and 4 for 5-shot and 1-shot, respectively. ... we implement a cosine annealing learning rate drop strategy for the meta-optimizer, starting with an initial learning rate of 0.01 and reducing it to a minimum of 5e-4 in the outer-loop |