reproducibilityindex.ai

LLM Augmented LLMs: Expanding Capabilities through Composition

Authors: Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate that augmenting Pa LM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when Pa LM2-S is augmented with a code-speciﬁc model, we see a relative improvement of 40% over the base model for code generation and explanation tasks on-par with fully ﬁne-tuned counterparts. 4 EXPERIMENTS We demonstrate the following in three domains: (a) an anchor LLM (m B) can be composed with an augmenting model (m A) trained on mappings between string keys and number values to solve arithmetic expressions over those keys requiring both, knowledge of the KV mappings and arithmetic capabilities ( 4.1); (b) how CALM can be used to expand the language coverage of an anchor LLM (m B) to low-resource languages it has not seen during pre-training... (c) how code completion and explanation can be improved...
Researcher Affiliation	Industry	Rachit Bansal1 Bidisha Samanta1 Siddharth Dalmia2 Nitish Gupta1 Shikhar Vashishth1 Sriram Ganapathy1 Abhishek Bapna1 Prateek Jain1 Partha Talukdar1 1Google Research India 2Google Deep Mind
Pseudocode	No	The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code	No	The paper does not contain an explicit statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets	Yes	We carry out these evaluations in a 5-shot in-context learning paradigm on the FLORES-200 (Costa-juss a et al., 2022) dataset. This dataset contains examples for 200 highand low-resource languages. (ii) Performing grade school math word problems expressed in a non-English language: We evaluate on the multilingual version of the GSM-8K dataset (Shi et al., 2023)...
Dataset Splits	No	The paper describes using a small amount of "composition training data (DC)" and "5-shot in-context learning" for evaluation, but does not specify traditional train/validation/test dataset splits with percentages or sample counts for the experimental setups.
Hardware Specification	No	The paper mentions various model sizes (e.g., Pa LM2-XXS, Pa LM2-XS, Pa LM2-S) but does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing instances used for the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) that would be needed to replicate the experiments.
Experiment Setup	No	The paper describes architectural choices like setting NA/n = 4 and using a "5-shot in-context learning paradigm" for evaluation, and also states settings for comparison methods like Lo RA rank, but it does not provide specific training hyperparameters such as learning rate, batch size, or number of epochs for its main model, CALM.