Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
How to prepare your task head for finetuning
Authors: Yi Ren, Shangmin Guo, Wonho Bae, Danica J. Sutherland
ICLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We analytically prove this trend in an overparamterized linear setting, and verify its applicability to different experimental settings. and we find a non-trivial trend in feature adaptation and verify it in many cases; and we show how controlling feature adaptation can improve downstream performance. |
| Researcher Affiliation | Academia | Yi Ren University of British Columbia EMAIL Shangmin Guo University of Edinburgh EMAIL Wonho Bae University of British Columbia EMAIL Danica J. Sutherland University of British Columbia & Amii EMAIL |
| Pseudocode | No | No explicitly labeled 'Pseudocode' or 'Algorithm' block is present in the paper. |
| Open Source Code | Yes | Code is available at https://github.com/Joshua-Ren/how_to_prepare_taskhead. |
| Open Datasets | Yes | MNIST (Le Cun, 1998), ImageNet-1K (Deng et al., 2009), CIFAR10 (Krizhevsky et al., 2009), PASCAL VOC (Everingham et al., 2015), STL10 (Coates et al., 2011), Flowers102 (Nilsback & Zisserman, 2008), Stanford Cars (Krause et al., 2013), Domain Net (Peng etol., 2019), ogbg-moltox21 (Wu et al., 2018), ogbg-molhiv (Hu et al., 2020), ogbg-molpcba (Hu et al., 2020). These datasets are all standard, publicly available benchmarks, and are cited appropriately. |
| Dataset Splits | Yes | Table 4: Datasets (vision and molecular graph) used in experiments. lists '# train' and '# test' columns for all datasets (e.g., 'MNIST 60,000 train, 10,000 test'). The paper also mentions 'validation accuracy after finetuning (FT-valid-acc for short)' and 'sweeping the optimal τ using validation accuracy'. |
| Hardware Specification | No | No specific hardware details (such as GPU/CPU models, memory, or specific cloud instances) are provided for the experiments. The paper generally mentions 'huge computing resources' but no specifications. |
| Software Dependencies | No | The paper mentions implementing models like ResNet, MLP, and GCN, and discusses training with SGD, but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | Yes | The paper provides specific hyperparameters and training details: 'batch size is 128, hidden layer width is 128 (in the MLP head case)', 'learning rate 10 3, with cosine scheduler', '5 10 4 weight decay', 'simple augmentations like random flipping and cropping', 'HP learning rate is 3 10 2', 'maximum FT epochs is 200', 'SGD with momentum (β = 0.9)' and 'batch size of 16, and a SGD optimizer with momention (β = 0.9) but without weight decay nor learning rate scheduler'. |