reproducibilityindex.ai

POUF: Prompt-Oriented Unsupervised Fine-tuning for Large Pre-trained Models

Authors: Korawat Tanwisuth, Shujian Zhang, Huangjie Zheng, Pengcheng He, Mingyuan Zhou

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify our approach s applicability, we conduct extensive experiments on image classification, sentiment analysis, and natural language inference tasks. Across 13 image-related tasks and 15 language-related ones, the proposed approach achieves consistent improvements over the baselines.
Researcher Affiliation	Collaboration	1The University of Texas at Austin 2Microsoft Azure AI.
Pseudocode	Yes	Algorithm 1 POUF Pseudocode for language-augmented vision models, Py Torch-like
Open Source Code	Yes	Py Torch code is available at https://github.com/ korawat-tanwisuth/POUF.
Open Datasets	Yes	Office-31 (Saenko et al., 2010) contains 4,652 images with 31 classes from three domains: Amazon (A), Webcam (W), and DSLR (D). GLUE benchmark (Wang et al., 2018)
Dataset Splits	Yes	Specifically, for each task, the data is split into Dtrain, Ddev, and Dtest. The authors tune the hyper-parameters on Ddev and report the performance of the model on Dtest. We validate the model s performance every 100 steps on Ddev and take the best validated checkpoint for the final evaluation on Dtest.
Hardware Specification	Yes	All experiments are conducted using a single Nvidia Tesla V100 GPU.
Software Dependencies	No	The paper mentions using Py Torch code and libraries like CLIP and TLlib but does not specify their version numbers.
Experiment Setup	Yes	The learning rate schedule is set to ηiter = η0(1 + γiter) α, where η0 is the initial learning rate. We adopt the following default hyper-parameters: γ = 2e 4, and α = 0.75. We set η0 = 5e 7 for all experiments except for prompt tuning on Office-31 where η0 = 1e 3. We use a mini-batch SGD with a momentum of 0.9 and a batch size of 96 for Office31 and Office-Home and 16 for Domain Net. The weight of the mutual-information objective, λ, is set to 0.3 for all experiments.