LLM Augmented LLMs: Expanding Capabilities through Composition

Authors: Rachit Bansal, Bidisha Samanta, Siddharth Dalmia, Nitish Gupta, Sriram Ganapathy, Abhishek Bapna, Prateek Jain, Partha Talukdar

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate that augmenting Pa LM2-S with a smaller model trained on low-resource languages results in an absolute improvement of up to 13% on tasks like translation into English and arithmetic reasoning for low-resource languages. Similarly, when Pa LM2-S is augmented with a code-specific model, we see a relative improvement of 40% over the base model for code generation and explanation tasks on-par with fully fine-tuned counterparts. 4 EXPERIMENTS We demonstrate the following in three domains: (a) an anchor LLM (m B) can be composed with an augmenting model (m A) trained on mappings between string keys and number values to solve arithmetic expressions over those keys requiring both, knowledge of the KV mappings and arithmetic capabilities ( 4.1); (b) how CALM can be used to expand the language coverage of an anchor LLM (m B) to low-resource languages it has not seen during pre-training... (c) how code completion and explanation can be improved...
Researcher Affiliation Industry Rachit Bansal1 Bidisha Samanta1 Siddharth Dalmia2 Nitish Gupta1 Shikhar Vashishth1 Sriram Ganapathy1 Abhishek Bapna1 Prateek Jain1 Partha Talukdar1 1Google Research India 2Google Deep Mind
Pseudocode No The paper does not contain any sections or figures explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code No The paper does not contain an explicit statement about releasing source code for the described methodology or a direct link to a code repository.
Open Datasets Yes We carry out these evaluations in a 5-shot in-context learning paradigm on the FLORES-200 (Costa-juss a et al., 2022) dataset. This dataset contains examples for 200 highand low-resource languages. (ii) Performing grade school math word problems expressed in a non-English language: We evaluate on the multilingual version of the GSM-8K dataset (Shi et al., 2023)...
Dataset Splits No The paper describes using a small amount of "composition training data (DC)" and "5-shot in-context learning" for evaluation, but does not specify traditional train/validation/test dataset splits with percentages or sample counts for the experimental setups.
Hardware Specification No The paper mentions various model sizes (e.g., Pa LM2-XXS, Pa LM2-XS, Pa LM2-S) but does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing instances used for the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or frameworks with their versions) that would be needed to replicate the experiments.
Experiment Setup No The paper describes architectural choices like setting NA/n = 4 and using a "5-shot in-context learning paradigm" for evaluation, and also states settings for comparison methods like Lo RA rank, but it does not provide specific training hyperparameters such as learning rate, batch size, or number of epochs for its main model, CALM.