QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

Authors: Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljacic

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that Quan TA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods.
Researcher Affiliation Academia 1NSF AI Institute for Artificial Intelligence and Fundamental Interactions 2Department of Physics, Massachusetts Institute of Technology 3Department of EECS, Massachusetts Institute of Technology 4Department of Physics, Harvard University {chenzhuo,rumenrd,cloh,odugan,diluo,soljacic}@mit.edu
Pseudocode Yes torch.einsum("...abc,efbc,diaf,ghde->...ghi", x, T_3, T_2, T_1) torch.einsum("efbc,diaf,ghde->ghiabc", T_3, T_2, T_1)
Open Source Code Yes *https://github.com/quanta-fine-tuning/quanta
Open Datasets Yes We assess the general applicability of the low-rank hypothesis, we examine two datasets of varying difficulties: the RTE dataset [49], a classification task... and the DROP dataset [50], a generation task...
Dataset Splits Yes Instead, we create a validation set from the train set and optimize the hyperparameters on the validation set.
Hardware Specification Yes All the experiments are conducted on NVIDIA A100 GPUs with 80 GB memory.
Software Dependencies No The paper mentions 'torch.einsum' and 'opt_einsum' as libraries used, and notes the code is implemented using [54] and [68] as references. However, no specific version numbers for these software components or programming languages are provided.
Experiment Setup Yes In Table E.2, we show the hyperparameters used for the DROP experiments. Only Lo RA and Quan TA are applied to the 13and 70-billion-parameter LLa MA2 models. For the 13-billion-parameter model or smaller, only a single A100 GPU is used for fine-tuning. And for the 70-billion-parameter model, four A100 GPUs are used.