Can Language Models Teach? Teacher Explanations Improve Student Performance via Personalization
Authors: Swarnadeep Saha, Peter Hase, Mohit Bansal
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 Experiment Results |
| Researcher Affiliation | Academia | Swarnadeep Saha Peter Hase Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {swarna, peter, mbansal}@cs.unc.edu |
| Pseudocode | No | The paper describes the steps and processes involved in the teacher-student interaction and intervention functions (e.g., in Figure 1), but it does not include a formally labeled pseudocode or algorithm block. |
| Open Source Code | Yes | 1Code for all experiments: https://github.com/swarna Hub/Explanation Intervention |
| Open Datasets | Yes | We experiment with three reasoning tasks: (1) Strategy QA [45], (2) GSM8k [46], and (3) Commonsense QA [47] (details in Appendix B). |
| Dataset Splits | Yes | For Strategy QA, we report results on the validation split, while for Commonsense QA and GSM8k, our experiments are on the test split. |
| Hardware Specification | Yes | We conduct experiments either on A100 Google Cloud instances or on internal A6000 GPU servers. |
| Software Dependencies | No | The paper mentions the use of LLMs like Flan-T5 and LLaMA, but it does not provide specific version numbers for any software dependencies or libraries like Python, PyTorch, or CUDA. |
| Experiment Setup | Yes | All models generate text using greedy decoding, prompted with either 4-8 demonstrations. Unless otherwise stated, the demonstrations are randomly chosen from the training samples. For Strategy QA, we report results on the validation split, while for Commonsense QA and GSM8k, our experiments are on the test split. To account for variance, we conduct experiments with at least three different seeds. |