Equi-Tuning: Group Equivariant Fine-Tuning of Pretrained Models

Authors: Sourya Basu, Prasanna Sattigeri, Karthikeyan Natesan Ramamurthy, Vijil Chenthamarakshan, Kush R. Varshney, Lav R. Varshney, Payel Das

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide experimental results for equi-tuning using a variety of pretrained models: Alexnet, Resnet, VGG, and Densenet for image classification; RNNs, GRUs, and LSTMs for compositional generalization; and GPT2 for fairness in NLG. and 6 Experiments We provide results for equi-tuning on image classification, compositional generalization, and fairness in NLG.
Researcher Affiliation Collaboration Sourya Basu1, 2, *, Prasanna Sattigeri1, Karthikeyan Natesan Ramamurthy1, Vijil Chenthamarakshan1, Kush R. Varshney1, Lav R. Varshney2, Payel Das1 1IBM Research Thomas J. Watson Research Center 2University of Illinois at Urbana-Champaign
Pseudocode Yes Sec. C gives efficient implementation of (3). Sec. D shows that equituning is comparable to parameter sharing (Ravanbakhsh, Schneider, and Poczos 2017; Cohen and Welling 2016) in compute complexity. As proved in H, MR G is also equivariant. We restrict our experiments in this work to scalar features for simplicity. Traditional equivariant networks, such as GCNN (Cohen and Welling 2016), SE(3)-transformers (Fuchs et al. 2020), and Lie Conv (Finzi et al. 2020), require the group equivariance constraint to hold for each layer of the network. In contrast, for equi-tuning, we only need to ensure that the group actions are defined on the input and output layers of the pretrained model, which is a key reason for the simplicity and generality of our algorithm. Now we provide an example of equi-tuning for image processing using the c4 = {e, r, r2, r3} group, where e is the identity and r denotes rotation by 90 . As shown in Fig. 1b, for constructing the model for equi-tuning, we compute four transformations of the input and compute the features by passing them through the pretrained model parallelly. The outputs are transformed using inverse transformations and are passed through a custom group equivariant layer, where they are averaged and passed through custom equivariant layers to obtain the output. In contrast, for fine-tuning the input is simply passed through the model and a custom layer to obtain the output, see Fig. 1a. F gives examples of equituning for language models.
Open Source Code No The paper does not provide any explicit statement about releasing its own source code, nor does it provide a link to a code repository for the Equi-Tuning method.
Open Datasets Yes We experiment on two datasets: Hymenoptera3 and CIFAR-10 (Krizhevsky, Nair, and Hinton 2010) and Hymenoptera: Obtained from https://www.kaggle.com/datasets/ajayrana/hymenoptera-data. and Krizhevsky, A.; Nair, V.; and Hinton, G. 2010. CIFAR-10 (Canadian Institute for Advanced Research). http://www.cs.toronto.edu/kriz/cifar.html. Accessed: 2022-06-05. and We use the SCAN dataset (Gordon et al. 2019) for our compositional generalization experiments.
Dataset Splits Yes Table 2: Equi-tuning LSTM for SCAN. LSTM and G-LSTM were trained for 200K iterations with relevant groups for each task. Equi LSTM models are LSTM models equi-tuned for 10K iterations using group relevant to each task. Results are over three random seeds.
Hardware Specification No The paper does not specify the hardware used for the experiments (e.g., specific GPU models, CPU types, or memory).
Software Dependencies No The paper mentions optimizers like 'Adam optimizer (Kingma and Ba 2015)' and 'stochastic gradient descent' and uses various models, but it does not specify versions for core software dependencies such as Python, PyTorch, or TensorFlow libraries.
Experiment Setup Yes We use stochastic gradient descent as the optimizer with momentum 0.9 and learning rate 3 * 10^-4. The models were fine-tuned with batchsize 8 for 10 epochs over 5 different random seeds. and All models contain a single layer cell of the recurrent model with 64 hidden units. We train these models on the Add jump task and the Around right task for 200k iterations using Adam optimizer (Kingma and Ba 2015) with learning rate 10^-4 and teacher-forcing ratio (Williams and Zipser 1989) 0.5.