Deep Attentive Model for Knowledge Tracing

Authors: Xinping Wang, Liangyu Chen, Min Zhang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through experimental results on three real-world datasets, DAKTN significantly outperforms state-of-the-art baseline models. We also present the reasonableness of DAKTN by ablation testing.
Researcher Affiliation Academia Xinping Wang, Liangyu Chen*, Min Zhang East China Normal University lychen@sei.ecnu.edu.cn
Pseudocode No The paper describes the model architecture and its components using mathematical formulations and diagrams, but it does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statements about releasing open-source code or include links to a code repository.
Open Datasets Yes In our experiments, we use the following three real-world datasets. The basic statistics of the datasets are summarized in Table 1. ASSIST09 and ASSIST12 are both collected by the ASSISTments online tutoring system [Feng, Heffernan, and Koedinger 2009]... Ed Net [Choi et al. 2020] is an open dataset collected by a multi-platform AI tutoring service Santa. We use Ed Net KT1 dataset in our experiments...
Dataset Splits Yes Then each dataset is divided into three parts, namely, 70%, 10% and 20% for training, validating and testing respectively.
Hardware Specification Yes All models are implemented by Tensorflow 2.3 using Python 3.7, and all experiments are executed on a Cent OS Linux server with the main configuration of GPU RTX 2080Ti, CPU@3.30GHz, 8GB RAM, and 1TB SSD Disk.
Software Dependencies Yes All models are implemented by Tensorflow 2.3 using Python 3.7...
Experiment Setup Yes In the step of data preprocessing, we first filter students who have answered fewer than 5 exercises to ensure each student with enough learning data. Then each dataset is divided into three parts, namely, 70%, 10% and 20% for training, validating and testing respectively. As to the implementation parameters, we initialize the parameters with batch normalization, and use the Adam algorithm to optimize our model. The dimensions of the positive full connection layers in Equation (4) are 256, 256, 1 respectively. We set the learning rate as 0.001, and use dropout with p = 0.4 to alleviate overfitting.