SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models

Authors: Jiayu Liu, Zhenya Huang, Tong Xiao, Jing Sha, Jinze Wu, Qi Liu, Shijin Wang, Enhong Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments verify that Socratic LM achieves significant improvements in the teaching performance, outperforming GPT4 by more than 12%. Our dataset and code is available at https://github.com/Ljyustc/Socratic LM.
Researcher Affiliation Collaboration Jiayu Liu1,2 Zhenya Huang1,2 Tong Xiao1,2 Jing Sha2 Jinze Wu2 Qi Liu1,2 Shijin Wang2 Enhong Chen1,2 1: University of Science and Technology of China 2: State Key Laboratory of Cognitive Intelligence {jy251198,tongxiao2002}@mail.ustc.edu.cn; {huangzhy,qiliuql,cheneh}@ustc.edu.cn; {jingsha,jzwu4,sjwang3}@ifytek.com
Pseudocode No The paper describes processes and pipelines but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our dataset and code is available at https://github.com/Ljyustc/Socratic LM.
Open Datasets Yes Our problems are sourced from two representative datasets: MAWPS [27] and GSM8K [8]... We construct a new dataset, Socra Teach, which consists of 35K high-quality, fine-grained Socratic-style multi-round teaching dialogues... Our dataset and code is available at https://github.com/Ljyustc/Socratic LM.
Dataset Splits Yes Of the remaining data in Socra Teach, 10%/90% is used for validation/training.
Hardware Specification Yes All experiments are conducted on a server with six NVIDIA RTX 3090 GPUs.
Software Dependencies No The paper mentions 'Chat GLM3-6b' as the base model for fine-tuning but does not list specific software frameworks (e.g., PyTorch, TensorFlow) or library versions with version numbers that are key dependencies for reproducibility.
Experiment Setup Yes Our Socratic LM is obtained by P-Tuning [36] Chat GLM3-6b (not Chat GLM3-6b-Base) for 2 epochs with a learning rate of 0.02 and batch size of 64.