Group and Shuffle: Efficient Structured Orthogonal Parametrization

Authors: Mikhail Gorbunov, Nikolay Yudin, Vera Soboleva, Aibek Alanov, Alexey Naumov, Maxim Rakhuba

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate our method on different domains, including adapting of text-to-image diffusion models and downstream task fine-tuning in language modeling. Additionally, we adapt our construction for orthogonal convolutions and conduct experiments with 1-Lipschitz neural networks.
Researcher Affiliation Academia Mikhail Gorbunov HSE University gorbunovmikh73@gmail.com Nikolay Yudin HSE University Vera Soboleva AIRI, HSE University Aibek Alanov AIRI, HSE University Alexey Naumov HSE University, Steklov Mathematical Institute of Russian Academy of Sciences Maxim Rakhuba HSE University
Pseudocode Yes Algorithm 1 Projection π( ) of A onto GS(PL, P, PR)
Open Source Code Yes Source code is available at: https://github.com/Skonor/group_and_shuffle
Open Datasets Yes We report result on the GLUE [Wang et al., 2018] benchmark with Ro BERTa-base [Liu et al., 2019] model. We use Stable Diffusion [Rombach et al., 2022] and the Dreambooth [Ruiz et al., 2023] dataset for all our experiments. Following [Singla and Feizi, 2021], we train Lip Convnet-n on CIFAR-100 dataset.
Dataset Splits Yes We follow training settings of [Liu et al., 2024b, Zhang et al., 2023]. We report best results on the evaluation set from the whole training.
Hardware Specification Yes All the experiments below were conducted on NVIDIA V100-SXM2-32Gb GPU.
Software Dependencies No The paper mentions the use of 'PEFT library' but does not provide specific version numbers for software dependencies.
Experiment Setup Yes All the models are trained using Adam optimizer with batch size = 4, learning rate = 0.00002, betas = (0.9, 0.999) and weight decay = 0.01.