reproducibilityindex.ai

Can Language Models Teach? Teacher Explanations Improve Student Performance via Personalization

Authors: Swarnadeep Saha, Peter Hase, Mohit Bansal

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 Experiment Results
Researcher Affiliation	Academia	Swarnadeep Saha Peter Hase Mohit Bansal Department of Computer Science University of North Carolina at Chapel Hill {swarna, peter, mbansal}@cs.unc.edu
Pseudocode	No	The paper describes the steps and processes involved in the teacher-student interaction and intervention functions (e.g., in Figure 1), but it does not include a formally labeled pseudocode or algorithm block.
Open Source Code	Yes	1Code for all experiments: https://github.com/swarna Hub/Explanation Intervention
Open Datasets	Yes	We experiment with three reasoning tasks: (1) Strategy QA [45], (2) GSM8k [46], and (3) Commonsense QA [47] (details in Appendix B).
Dataset Splits	Yes	For Strategy QA, we report results on the validation split, while for Commonsense QA and GSM8k, our experiments are on the test split.
Hardware Specification	Yes	We conduct experiments either on A100 Google Cloud instances or on internal A6000 GPU servers.
Software Dependencies	No	The paper mentions the use of LLMs like Flan-T5 and LLaMA, but it does not provide specific version numbers for any software dependencies or libraries like Python, PyTorch, or CUDA.
Experiment Setup	Yes	All models generate text using greedy decoding, prompted with either 4-8 demonstrations. Unless otherwise stated, the demonstrations are randomly chosen from the training samples. For Strategy QA, we report results on the validation split, while for Commonsense QA and GSM8k, our experiments are on the test split. To account for variance, we conduct experiments with at least three different seeds.