SocraticLM: Exploring Socratic Personalized Teaching with Large Language Models
Authors: Jiayu Liu, Zhenya Huang, Tong Xiao, Jing Sha, Jinze Wu, Qi Liu, Shijin Wang, Enhong Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments verify that Socratic LM achieves significant improvements in the teaching performance, outperforming GPT4 by more than 12%. Our dataset and code is available at https://github.com/Ljyustc/Socratic LM. |
| Researcher Affiliation | Collaboration | Jiayu Liu1,2 Zhenya Huang1,2 Tong Xiao1,2 Jing Sha2 Jinze Wu2 Qi Liu1,2 Shijin Wang2 Enhong Chen1,2 1: University of Science and Technology of China 2: State Key Laboratory of Cognitive Intelligence {jy251198,tongxiao2002}@mail.ustc.edu.cn; {huangzhy,qiliuql,cheneh}@ustc.edu.cn; {jingsha,jzwu4,sjwang3}@ifytek.com |
| Pseudocode | No | The paper describes processes and pipelines but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our dataset and code is available at https://github.com/Ljyustc/Socratic LM. |
| Open Datasets | Yes | Our problems are sourced from two representative datasets: MAWPS [27] and GSM8K [8]... We construct a new dataset, Socra Teach, which consists of 35K high-quality, fine-grained Socratic-style multi-round teaching dialogues... Our dataset and code is available at https://github.com/Ljyustc/Socratic LM. |
| Dataset Splits | Yes | Of the remaining data in Socra Teach, 10%/90% is used for validation/training. |
| Hardware Specification | Yes | All experiments are conducted on a server with six NVIDIA RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions 'Chat GLM3-6b' as the base model for fine-tuning but does not list specific software frameworks (e.g., PyTorch, TensorFlow) or library versions with version numbers that are key dependencies for reproducibility. |
| Experiment Setup | Yes | Our Socratic LM is obtained by P-Tuning [36] Chat GLM3-6b (not Chat GLM3-6b-Base) for 2 epochs with a learning rate of 0.02 and batch size of 64. |