Enhancing Transfer Learning with Flexible Nonparametric Posterior Sampling

Authors: Hyungi Lee, Giung Nam, Edwin Fong, Juho Lee

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present empirical evidence that demonstrates the effectiveness of our proposed posterior sampling method in practical transfer learning scenarios. 5.1 includes experiments conducted on vision tasks, while 5.2 contains experiments conducted on language tasks.
Researcher Affiliation Collaboration Hyungi Lee 1 Giung Nam 1 Edwin Fong2 Juho Lee1,3 1KAIST AI 2The University of Hong Kong 3AITRICS {lhk2708, giung, juholee}@kaist.ac.kr, chefong@hku.hk
Pseudocode Yes Please refer to Algorithm 1 for a summary of the NPTL algorithm.
Open Source Code No We intend to make the code available to the public once the research has been published.
Open Datasets Yes C10 and C100 : CIFAR-10/1002 (Krizhevsky et al., 2009) consists of 10/100 classes sourced from 80 Million Tiny Images (Torralba et al., 2008), and every image in this dataset has sizes of 32 32. We allocated the 60,000 images publicly available into splits of 45,000 for training, 5,000 for validation, and 10,000 for testing.
Dataset Splits Yes C10 and C100 : CIFAR-10/1002 (Krizhevsky et al., 2009) consists of 10/100 classes sourced from 80 Million Tiny Images (Torralba et al., 2008), and every image in this dataset has sizes of 32 32. We allocated the 60,000 images publicly available into splits of 45,000 for training, 5,000 for validation, and 10,000 for testing.
Hardware Specification Yes All experiments were conducted on NVIDIA RTX 3090 GPU machines.
Software Dependencies No Our code is constructed using the following libraries, which are available under the Apache-2.0 licence1: JAX (Bradbury et al., 2018), Flax (Babuschkin et al., 2020), Optax (Babuschkin et al., 2020), Tensor Flow Datasets (Abadi et al., 2015), and Transformers (Wolf et al., 2020).
Experiment Setup Yes We utilized an SGD optimizer with momentum (Polyak, 1964). The momentum value was kept constant at 0.9, and we experimented with different base learning rates using the cosine decay schedule, specifically testing values in the range of {0.1, 0.03, 0.01}. For the Res Net-20x4 experiments, training terminated after 10 epochs with a batch size of 80. Table 7 summarizes a comprehensive outline of the hyperparameter settings for each experiment.