reproducibilityindex.ai

How to prepare your task head for finetuning

Authors: Yi Ren, Shangmin Guo, Wonho Bae, Danica J. Sutherland

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We analytically prove this trend in an overparamterized linear setting, and verify its applicability to different experimental settings. and we find a non-trivial trend in feature adaptation and verify it in many cases; and we show how controlling feature adaptation can improve downstream performance.
Researcher Affiliation	Academia	Yi Ren University of British Columbia renyi.joshua@gmail.com Shangmin Guo University of Edinburgh s.guo@ed.ac.uk Wonho Bae University of British Columbia whbae@cs.ubc.ca Danica J. Sutherland University of British Columbia & Amii dsuth@cs.ubc.ca
Pseudocode	No	No explicitly labeled 'Pseudocode' or 'Algorithm' block is present in the paper.
Open Source Code	Yes	Code is available at https://github.com/Joshua-Ren/how_to_prepare_taskhead.
Open Datasets	Yes	MNIST (Le Cun, 1998), ImageNet-1K (Deng et al., 2009), CIFAR10 (Krizhevsky et al., 2009), PASCAL VOC (Everingham et al., 2015), STL10 (Coates et al., 2011), Flowers102 (Nilsback & Zisserman, 2008), Stanford Cars (Krause et al., 2013), Domain Net (Peng etol., 2019), ogbg-moltox21 (Wu et al., 2018), ogbg-molhiv (Hu et al., 2020), ogbg-molpcba (Hu et al., 2020). These datasets are all standard, publicly available benchmarks, and are cited appropriately.
Dataset Splits	Yes	Table 4: Datasets (vision and molecular graph) used in experiments. lists '# train' and '# test' columns for all datasets (e.g., 'MNIST 60,000 train, 10,000 test'). The paper also mentions 'validation accuracy after finetuning (FT-valid-acc for short)' and 'sweeping the optimal τ using validation accuracy'.
Hardware Specification	No	No specific hardware details (such as GPU/CPU models, memory, or specific cloud instances) are provided for the experiments. The paper generally mentions 'huge computing resources' but no specifications.
Software Dependencies	No	The paper mentions implementing models like ResNet, MLP, and GCN, and discusses training with SGD, but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	The paper provides specific hyperparameters and training details: 'batch size is 128, hidden layer width is 128 (in the MLP head case)', 'learning rate 10 3, with cosine scheduler', '5 10 4 weight decay', 'simple augmentations like random flipping and cropping', 'HP learning rate is 3 10 2', 'maximum FT epochs is 200', 'SGD with momentum (β = 0.9)' and 'batch size of 16, and a SGD optimizer with momention (β = 0.9) but without weight decay nor learning rate scheduler'.