Deep Attentive Model for Knowledge Tracing
Authors: Xinping Wang, Liangyu Chen, Min Zhang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through experimental results on three real-world datasets, DAKTN significantly outperforms state-of-the-art baseline models. We also present the reasonableness of DAKTN by ablation testing. |
| Researcher Affiliation | Academia | Xinping Wang, Liangyu Chen*, Min Zhang East China Normal University lychen@sei.ecnu.edu.cn |
| Pseudocode | No | The paper describes the model architecture and its components using mathematical formulations and diagrams, but it does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any statements about releasing open-source code or include links to a code repository. |
| Open Datasets | Yes | In our experiments, we use the following three real-world datasets. The basic statistics of the datasets are summarized in Table 1. ASSIST09 and ASSIST12 are both collected by the ASSISTments online tutoring system [Feng, Heffernan, and Koedinger 2009]... Ed Net [Choi et al. 2020] is an open dataset collected by a multi-platform AI tutoring service Santa. We use Ed Net KT1 dataset in our experiments... |
| Dataset Splits | Yes | Then each dataset is divided into three parts, namely, 70%, 10% and 20% for training, validating and testing respectively. |
| Hardware Specification | Yes | All models are implemented by Tensorflow 2.3 using Python 3.7, and all experiments are executed on a Cent OS Linux server with the main configuration of GPU RTX 2080Ti, CPU@3.30GHz, 8GB RAM, and 1TB SSD Disk. |
| Software Dependencies | Yes | All models are implemented by Tensorflow 2.3 using Python 3.7... |
| Experiment Setup | Yes | In the step of data preprocessing, we first filter students who have answered fewer than 5 exercises to ensure each student with enough learning data. Then each dataset is divided into three parts, namely, 70%, 10% and 20% for training, validating and testing respectively. As to the implementation parameters, we initialize the parameters with batch normalization, and use the Adam algorithm to optimize our model. The dimensions of the positive full connection layers in Equation (4) are 256, 256, 1 respectively. We set the learning rate as 0.001, and use dropout with p = 0.4 to alleviate overfitting. |