Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

simpleKT: A Simple But Tough-to-Beat Baseline for Knowledge Tracing

Authors: Zitao Liu, Qiongqiong Liu, Jiahao Chen, Shuyan Huang, Weiqi Luo

ICLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments show that such a simple baseline is able to always rank top 3 in terms of AUC scores and achieve 57 wins, 3 ties and 16 loss against 12 DLKT baseline methods on 7 public datasets of different domains.
Researcher Affiliation	Collaboration	Zitao Liu Guangdong Institute of Smart Education, Jinan University, Guangzhou, China EMAIL Qiongqiong Liu, Jiahao Chen, Shuyan Huang TAL Education Group, Beijing, China EMAIL Weiqi Luo Guangdong Institute of Smart Education, Jinan University, Guangzhou, China EMAIL
Pseudocode	No	The paper describes the proposed method using mathematical formulations and descriptive text, but does not include a dedicated pseudocode or algorithm block.
Open Source Code	Yes	Code is available at https://github.com/pykt-team/pykt-toolkit1. To encourage reproducible research, all the related codes, data and the learned SIMPLEKT models are publicly available at https://github.com/pykt-team/pykt-toolkit. The code of SIMPLEKT and its variants, i.e., SIMPLEKT-Scalar Diff and SIMPLEKT-No Diff, to reproduce the experimental results can be found at https://github.com/pykt-team/ pykt-toolkit.
Open Datasets	Yes	In this paper, we experiment with 7 widely used datasets to comprehensively evaluate the performance of our models. ... ASSISTments2009 (AS2009)6: ... https://sites.google.com/site/assistmentsdata/home/2009-2010-assistment-data/ skill-builder-data-2009-2010. Algebra2005 (AL2005)7: ... https://pslcdatashop.web.cmu.edu/KDDCup/. Bridge2006 (BD2006)7: ... https://pslcdatashop.web.cmu.edu/KDDCup/. NIPS348: ... https://eedi.com/projects/neurips-education-challenge. Statics20119: ... https://pslcdatashop.web.cmu.edu/Dataset Info?dataset Id=507. ASSISTments2015 (AS2015)10: ... https://sites.google.com/site/assistmentsdata/datasets/2015-assistments-skill-builder-data. POJ11: ... https://drive.google.com/drive/folders/1LRljq Wf ODw TYRMPw6w EJ_m Mt1KZ4x BDk.
Dataset Splits	Yes	Similar to (Liu et al., 2022), we randomly withhold 20% of the students sequences for model evaluation and we perform standard 5-fold cross validation on the rest 80% of each dataset.
Hardware Specification	Yes	Our model is implemented in Py Torch and trained on NVIDIA RTX 3090 GPU device.
Software Dependencies	No	The paper states 'Our model is implemented in Py Torch' but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup	Yes	The embedding dimension, the hidden state dimension, the two dimension of the prediction layers are set to [64, 128], the learning rate and dropout rate are set to [1e-3, 1e-4, 1e-5] and [0.05, 0.1, 0.3, 0.5] respectively, the number of blocks and attention heads are set to [1, 2, 4] and [4, 8], the seed is set to [42, 3407] for reproducing the experimental results.