Enhancing Transfer Learning with Flexible Nonparametric Posterior Sampling
Authors: Hyungi Lee, Giung Nam, Edwin Fong, Juho Lee
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present empirical evidence that demonstrates the effectiveness of our proposed posterior sampling method in practical transfer learning scenarios. 5.1 includes experiments conducted on vision tasks, while 5.2 contains experiments conducted on language tasks. |
| Researcher Affiliation | Collaboration | Hyungi Lee 1 Giung Nam 1 Edwin Fong2 Juho Lee1,3 1KAIST AI 2The University of Hong Kong 3AITRICS {lhk2708, giung, juholee}@kaist.ac.kr, chefong@hku.hk |
| Pseudocode | Yes | Please refer to Algorithm 1 for a summary of the NPTL algorithm. |
| Open Source Code | No | We intend to make the code available to the public once the research has been published. |
| Open Datasets | Yes | C10 and C100 : CIFAR-10/1002 (Krizhevsky et al., 2009) consists of 10/100 classes sourced from 80 Million Tiny Images (Torralba et al., 2008), and every image in this dataset has sizes of 32 32. We allocated the 60,000 images publicly available into splits of 45,000 for training, 5,000 for validation, and 10,000 for testing. |
| Dataset Splits | Yes | C10 and C100 : CIFAR-10/1002 (Krizhevsky et al., 2009) consists of 10/100 classes sourced from 80 Million Tiny Images (Torralba et al., 2008), and every image in this dataset has sizes of 32 32. We allocated the 60,000 images publicly available into splits of 45,000 for training, 5,000 for validation, and 10,000 for testing. |
| Hardware Specification | Yes | All experiments were conducted on NVIDIA RTX 3090 GPU machines. |
| Software Dependencies | No | Our code is constructed using the following libraries, which are available under the Apache-2.0 licence1: JAX (Bradbury et al., 2018), Flax (Babuschkin et al., 2020), Optax (Babuschkin et al., 2020), Tensor Flow Datasets (Abadi et al., 2015), and Transformers (Wolf et al., 2020). |
| Experiment Setup | Yes | We utilized an SGD optimizer with momentum (Polyak, 1964). The momentum value was kept constant at 0.9, and we experimented with different base learning rates using the cosine decay schedule, specifically testing values in the range of {0.1, 0.03, 0.01}. For the Res Net-20x4 experiments, training terminated after 10 epochs with a batch size of 80. Table 7 summarizes a comprehensive outline of the hyperparameter settings for each experiment. |