Can LLMs Learn by Teaching for Better Reasoning? A Preliminary Study

Authors: Xuefei Ning, Zifu Wang, Shiyao Li, Zinan Lin, Peiran Yao, Tianyu Fu, Matthew Blaschko, Guohao Dai, Huazhong Yang, Yu Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we provide a preliminary exploration of this question. We show that Lb T ideas can be incorporated into existing LLM training/prompting pipelines and bring improvements. Specifically, we design three methods, each mimicking one of the three levels of Lb T: observing students feedback, learning from the feedback, and learning iteratively, with the goal of improving answer accuracy without training or improving models inherent capability with fine-tuning. We reveal some findings: (1) Teaching materials that make it easier for students to learn (via in-context learning) have clearer and more accurate logic; (2) Weak-to-strong generalization: Lb T might help improve strong models by teaching weak models; (3) Diversity in students might help: teaching multiple students could be better than teaching a single student or the teacher alone. We hope that our exploration can inspire future research on Lb T and, more broadly, the adoption of advanced education techniques to improve LLMs. The code and website are at https://github.com/imagination-research/lbt and https://sites.google.com/view/llm-learning-by-teaching.
Researcher Affiliation Collaboration Xuefei Ning 1, Zifu Wang 2, Shiyao Li 1,3, Zinan Lin 4, Peiran Yao 3,5, Tianyu Fu1,3, Matthew B. Blaschko2, Guohao Dai6,3, Huazhong Yang1, Yu Wang1 1Tsinghua University 2KU Leuven 3Infinigence-AI 4Microsoft Research 5University of Alberta 6Shanghai Jiao Tong University
Pseudocode Yes Algorithm A1 The Workflow of M1 and Algorithm A2 The Workflow of M3 are explicitly provided and labeled as algorithms.
Open Source Code Yes The code and website are at https://github.com/imagination-research/lbt and https://sites.google.com/view/llm-learning-by-teaching.
Open Datasets Yes We use the extension MATH() [72] of the MATH dataset [27], where each problem has variants with different values. ... We use the Grandmaster Dynamic Programming (DP) study plan on Leet Code. ... We evaluate M3 on two binary text classification tasks: Liar [78] and Logical Fallacy [34].
Dataset Splits Yes Following the train-test split specified by [45], among the 500 test problems, 181 problems are provided with 3 functional variants each. We use these 181 problems as TPs. ... We report the teacher F1 score on the dev and test splits combined. ... The EPs are randomly sampled from the training data in each iteration.
Hardware Specification No The paper mentions obtaining access to "LUMI supercomputer" and "Leonardo supercomputer" but does not specify the types of CPUs, GPUs, or other detailed hardware specifications used for the experiments.
Software Dependencies No The paper refers to using Python for code generation and mentions specific LLM models (e.g., GPT-3.5-0613, LLaMA3 family), but it does not list specific versions of programming languages, libraries, or frameworks (e.g., Python 3.x, PyTorch 1.x, CUDA x.x).
Experiment Setup Yes Following [73], we use top-K sampling with K=20 and a temperature of 0.7. ... For GPT-3.5-0613, we set the temperature and top-P to 1, while for the LLa MA3 family, we set the temperature to 0.6 and top-P to 0.9 (default setting). ... We use a learning rate of 5e-7, a batch size of 16 and 1 training epoch. We set β = 0.1 and add an additional NLL term [55] weighted by 50.