Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing

Authors: Shangshang Yang, Xiaoshan Yu, Ye Tian, Xueming Yan, Haiping Ma, Xingyi Zhang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on the two largest and most challenging education datasets demonstrate the effectiveness of the architecture found by the proposed approach.
Researcher Affiliation Academia Shangshang Yang1,3 Xiaoshan Yu1,3 Ye Tian1,3 Xueming Yan2 Haiping Ma1,3,4 Xingyi Zhang1,3 1Anhui University 2Guangdong University of Foreign Studies 3Key Laboratory of Intelligent Computing and Signal Processing of Ministry of Education 4Department of Information Materials and Intelligent Sensing Laboratory of Anhui Province
Pseudocode Yes Algorithm 1 Main Steps of ENAS-kT; Algorithm 2 Search Space Reduction
Open Source Code Yes The source code of the proposed approach is publicly available at https://github.com/DevilYangS/ENAS-KT.
Open Datasets Yes For convincing validation, two largest and most challenging real-world education datasets Ed Net [8] and RAIEd2020 [34] were used in experiments, whose statistics are summarized in Table 1.
Dataset Splits Yes all students were randomly split into 70%/10%/20% for training/validation/testing. The maximal length of input sequences was set to 100 (L=100), we truncated student learning interactions longer than 100 to several sub-sequences, and 5-fold cross-validation was used.
Hardware Specification Yes All experiments were implemented with PyTorch and run under NVIDIA 3080 GPU.
Software Dependencies No The paper mentions "PyTorch" as software used but does not specify a version number or other software dependencies with their versions.
Experiment Setup Yes The maximal length of input sequences was set to 100 (L=100), we truncated student learning interactions longer than 100 to several sub-sequences, and 5-fold cross-validation was used. The number of blocks N=4, embedding size D=128, the hidden dimension of FFN was set to 128. To train the supernet, epoch number, learning rate, dropout rate, and batch size was set to 60, 1e-3, 0.1, and 128, and the number of warm-up steps in Noam scheme [7] was set to 8,000. To train the best architecture, epoch number and the number of warm-up steps was set to 30 and 4,000, and others kept same as above.