On the Emergence of Cross-Task Linearity in Pretraining-Finetuning Paradigm
Authors: Zhanpeng Zhou, Zijun Chen, Yilan Chen, Bo Zhang, Junchi Yan
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide comprehensive empirical evidence supporting that CTL consistently occurs for finetuned models that start from the same pretrained checkpoint. We released our source code at https://github.com/zzp1012/ Cross-Task-Linearity. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence & Department of Computer Science and Engineering & Mo E Lab of AI, Shanghai Jiao Tong University, Shanghai, China 2Shanghai Artificial Intelligence Laboratory 3Computer Science and Engineering, University of California San Diego. |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We released our source code at https://github.com/zzp1012/ Cross-Task-Linearity. |
| Open Datasets | Yes | In Section 4.1, we conduct experiments on standard continue learning benchmark datasets, including Rotated MNIST (Le Cun et al., 1998) and Split CIFAR-100 (Krizhevsky et al., 2009), with MLP and Res Net-18 (He et al., 2016). [...] In Sections 4.2 and 4.3, we directly adopt the finetuned Vi Ts (Dosovitskiy et al., 2020)/T5s (Raffel et al., 2020) checkpoints open-sourced by Wortsman et al. (2022a); Ilharco et al. (2023) and perform experiments on various image and text datasets. |
| Dataset Splits | Yes | Main Experimental Setup. In Section 4.1, we conduct experiments on standard continue learning benchmark datasets, including Rotated MNIST (Le Cun et al., 1998) and Split CIFAR-100 (Krizhevsky et al., 2009), with MLP and Res Net-18 (He et al., 2016). We follow the same training procedures and hyper-parameters as in Mirzadeh et al. (2021). [...] Our analysis focuses on models trained on a training set, with all investigations evaluated on a separate test set. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., GPU models, CPU specifications, memory). |
| Software Dependencies | No | The paper mentions 'torchvision (contributors, 2016)' but does not specify a version number for it or any other software dependencies. It also refers to models like ResNet-18, ViT, T5, but these are models, not software dependencies in the sense of libraries with versions. |
| Experiment Setup | Yes | Detailed Experimental Settings: Multi-Layer Perceptron on the Rotated MNIST Dataset. [...] Optimization is done with the default SGD algorithm and the learning rate of 1 10 1, the batch size is set to 64 and the training epoch is set to 1 for both pretraining and finetuning. [...] Res Net-18 on the Split CIFAR-100 Dataset. [...] optimization is done using the default SGD algorithm with learning rate of 5 10 2. The batch size is set to 64. The training epoch is set to 10 for both pretraining and finetuning. |