Can Language Models Teach? Teacher Explanations Improve Student Performance via Personalization

Authors: Swarnadeep Saha, Peter Hase, Mohit Bansal

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 5 Experiment Results
Researcher Affiliation Academia Swarnadeep Saha Peter Hase Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {swarna, peter, mbansal}@cs.unc.edu
Pseudocode No The paper describes the steps and processes involved in the teacher-student interaction and intervention functions (e.g., in Figure 1), but it does not include a formally labeled pseudocode or algorithm block.
Open Source Code Yes 1Code for all experiments: https://github.com/swarna Hub/Explanation Intervention
Open Datasets Yes We experiment with three reasoning tasks: (1) Strategy QA [45], (2) GSM8k [46], and (3) Commonsense QA [47] (details in Appendix B).
Dataset Splits Yes For Strategy QA, we report results on the validation split, while for Commonsense QA and GSM8k, our experiments are on the test split.
Hardware Specification Yes We conduct experiments either on A100 Google Cloud instances or on internal A6000 GPU servers.
Software Dependencies No The paper mentions the use of LLMs like Flan-T5 and LLaMA, but it does not provide specific version numbers for any software dependencies or libraries like Python, PyTorch, or CUDA.
Experiment Setup Yes All models generate text using greedy decoding, prompted with either 4-8 demonstrations. Unless otherwise stated, the demonstrations are randomly chosen from the training samples. For Strategy QA, we report results on the validation split, while for Commonsense QA and GSM8k, our experiments are on the test split. To account for variance, we conduct experiments with at least three different seeds.