reproducibilityindex.ai

QuanTA: Efficient High-Rank Fine-Tuning of LLMs with Quantum-Informed Tensor Adaptation

Authors: Zhuo Chen, Rumen Dangovski, Charlotte Loh, Owen Dugan, Di Luo, Marin Soljacic

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments demonstrate that Quan TA significantly enhances commonsense reasoning, arithmetic reasoning, and scalability compared to traditional methods.
Researcher Affiliation	Academia	1NSF AI Institute for Artificial Intelligence and Fundamental Interactions 2Department of Physics, Massachusetts Institute of Technology 3Department of EECS, Massachusetts Institute of Technology 4Department of Physics, Harvard University {chenzhuo,rumenrd,cloh,odugan,diluo,soljacic}@mit.edu
Pseudocode	Yes	torch.einsum("...abc,efbc,diaf,ghde->...ghi", x, T_3, T_2, T_1) torch.einsum("efbc,diaf,ghde->ghiabc", T_3, T_2, T_1)
Open Source Code	Yes	*https://github.com/quanta-fine-tuning/quanta
Open Datasets	Yes	We assess the general applicability of the low-rank hypothesis, we examine two datasets of varying difficulties: the RTE dataset [49], a classification task... and the DROP dataset [50], a generation task...
Dataset Splits	Yes	Instead, we create a validation set from the train set and optimize the hyperparameters on the validation set.
Hardware Specification	Yes	All the experiments are conducted on NVIDIA A100 GPUs with 80 GB memory.
Software Dependencies	No	The paper mentions 'torch.einsum' and 'opt_einsum' as libraries used, and notes the code is implemented using [54] and [68] as references. However, no specific version numbers for these software components or programming languages are provided.
Experiment Setup	Yes	In Table E.2, we show the hyperparameters used for the DROP experiments. Only Lo RA and Quan TA are applied to the 13and 70-billion-parameter LLa MA2 models. For the 13-billion-parameter model or smaller, only a single A100 GPU is used for fine-tuning. And for the 70-billion-parameter model, four A100 GPUs are used.