QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation
Authors: Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljacic
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that Quan TA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods. |
| Researcher Affiliation | Academia | 1NSF AI Institute for Artificial Intelligence and Fundamental Interactions 2Department of Physics, Massachusetts Institute of Technology 3Department of EECS, Massachusetts Institute of Technology 4Department of Physics, Harvard University {chenzhuo,rumenrd,cloh,odugan,diluo,soljacic}@mit.edu |
| Pseudocode | Yes | torch.einsum("...abc,efbc,diaf,ghde->...ghi", x, T_3, T_2, T_1) torch.einsum("efbc,diaf,ghde->ghiabc", T_3, T_2, T_1) |
| Open Source Code | Yes | *https://github.com/quanta-fine-tuning/quanta |
| Open Datasets | Yes | We assess the general applicability of the low-rank hypothesis, we examine two datasets of varying difficulties: the RTE dataset [49], a classification task... and the DROP dataset [50], a generation task... |
| Dataset Splits | Yes | Instead, we create a validation set from the train set and optimize the hyperparameters on the validation set. |
| Hardware Specification | Yes | All the experiments are conducted on NVIDIA A100 GPUs with 80 GB memory. |
| Software Dependencies | No | The paper mentions 'torch.einsum' and 'opt_einsum' as libraries used, and notes the code is implemented using [54] and [68] as references. However, no specific version numbers for these software components or programming languages are provided. |
| Experiment Setup | Yes | In Table E.2, we show the hyperparameters used for the DROP experiments. Only Lo RA and Quan TA are applied to the 13and 70-billion-parameter LLa MA2 models. For the 13-billion-parameter model or smaller, only a single A100 GPU is used for fine-tuning. And for the 70-billion-parameter model, four A100 GPUs are used. |