reproducibilityindex.ai

Predictive, scalable and interpretable knowledge tracing on structured domains

Authors: Hanqi Zhou, Robert Bamler, Charley M Wu, Álvaro Tejero-Cantero

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluated on three datasets from online learning platforms, PSI-KT achieves superior multi-step predictive accuracy and scalable inference in continual-learning settings, all while providing interpretable representations of learner-specific traits and the prerequisite structure of knowledge that causally supports learning.
Researcher Affiliation	Academia	Hanqi Zhou1,2,4, Robert Bamler1,3, Charley M. Wu1,2,3 , & Álvaro Tejero-Cantero1,2 1University of Tübingen, 2Cluster of Excellence Machine Learning, 3Tübingen AI Center, 4IMPRS-IS {hanqi.zhou,robert.bamler,charley.wu,alvaro.tejero}@uni-tuebingen.de
Pseudocode	No	The paper provides detailed mathematical formulations and equations for its model and inference method (e.g., Eqs. 1-10), along with a graphical overview (Fig. 7 in Appendix A.4). However, it does not include a distinct block or figure explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Code at github.com/mlcolab/psi-kt
Open Datasets	Yes	We use Assistments 2012 and 2017 datasets2 (Assist12 and Assist17) and Junyi s 2015 dataset3 (Junyi15; Chang et al., 2015), which in addition to interaction data, provides human-annotated KC relations (see Table 1 and Appendix A.3.2 for details). 2https://sites.google.com/site/assistmentsdata 3https://pslcdatashop.web.cmu.edu/Dataset Info?dataset Id=1198
Dataset Splits	Yes	In our evaluations, we mainly focus on prediction and generalization when training on 10 interactions from up to 1000 learners. The between-learner generalization accuracy of the models above, when tested on 100 out-of-sample learners, is shown in Table 2, where fine-tuning indicates that parameters were updated using (10-point) learning histories from the unseen learners.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running experiments, such as GPU models, CPU specifications, or memory configurations. It mentions that the model 'scales well' but provides no technical specifications for the computational environment.
Software Dependencies	No	The paper discusses the use of 'deep learning methods', 'recurrent neural networks', 'LSTM networks', and 'graph neural networks'. However, it does not specify any particular software versions for frameworks (e.g., PyTorch, TensorFlow), programming languages (e.g., Python), or libraries used in the implementation of the model or experiments.
Experiment Setup	Yes	In our evaluations, we mainly focus on prediction and generalization when training on 10 interactions from up to 1000 learners. Good KT performance with little data is key in practical ITS to minimize the number of learners on an experimental treatment (principle of equipoise, similar to medical research; Burkholder, 2021), to mitigate the cold-start problem, and to extend the usefulness of the model to classroom-size groups. To provide ITS with a basis for adaptive guidance and long-term learner assessment, we always predict the 10 next interactions. ... Each model is initially trained on 10 interactions from 100 learners. We then incrementally provide one data point from each learner, and evaluate the training costs and prediction accuracy.