On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm

Authors: Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang, Junchi Yan

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We released our source code at https://github.com/zzp1012/ Cross-Task-Linearity.
Researcher Affiliation Academia 1School of Artificial Intelligence & Department of Computer Science and Engineering & Mo E Lab of AI, Shanghai Jiao Tong University, Shanghai, China 2Shanghai Artificial Intelligence Laboratory 3Computer Science and Engineering, University of California San Diego.
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes We released our source code at https://github.com/zzp1012/ Cross-Task-Linearity.
Open Datasets Yes In Section 4.1, we conduct experiments on standard continue learning benchmark datasets, including Rotated MNIST (Le Cun et al., 1998) and Split CIFAR-100 (Krizhevsky et al., 2009), with MLP and Res Net-18 (He et al., 2016). [...] In Sections 4.2 and 4.3, we directly adopt the finetuned Vi Ts (Dosovitskiy et al., 2020)/T5s (Raffel et al., 2020) checkpoints open-sourced by Wortsman et al. (2022a); Ilharco et al. (2023) and perform experiments on various image and text datasets.
Dataset Splits Yes Main Experimental Setup. In Section 4.1, we conduct experiments on standard continue learning benchmark datasets, including Rotated MNIST (Le Cun et al., 1998) and Split CIFAR-100 (Krizhevsky et al., 2009), with MLP and Res Net-18 (He et al., 2016). We follow the same training procedures and hyper-parameters as in Mirzadeh et al. (2021). [...] Our analysis focuses on models trained on a training set, with all investigations evaluated on a separate test set.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU specifications, memory).
Software Dependencies No The paper mentions 'torchvision (contributors, 2016)' but does not specify a version number for it or any other software dependencies. It also refers to models like ResNet-18, ViT, T5, but these are models, not software dependencies in the sense of libraries with versions.
Experiment Setup Yes Detailed Experimental Settings: Multi-Layer Perceptron on the Rotated MNIST Dataset. [...] Optimization is done with the default SGD algorithm and the learning rate of 1 10 1, the batch size is set to 64 and the training epoch is set to 1 for both pretraining and finetuning. [...] Res Net-18 on the Split CIFAR-100 Dataset. [...] optimization is done using the default SGD algorithm with learning rate of 5 10 2. The batch size is set to 64. The training epoch is set to 10 for both pretraining and finetuning.