reproducibilityindex.ai

Predictive Student Modeling in Educational Games with Multi-Task Learning

Authors: Michael Geden, Andrew Emerson, Jonathan Rowe, Roger Azevedo, James Lester654-661

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Using sequential representations of student gameplay, results show that multi-task stacked LSTMs with residual connections significantly outperform baseline models that do not use the multi-task formulation. Additionally, the accuracy of predictive student models is improved as the number of tasks increases. These findings have significant implications for the design and development of predictive student models in adaptive learning environments.
Researcher Affiliation	Academia	Michael Geden,1 Andrew Emerson,1 Jonathan Rowe,1 Roger Azevedo,2 James Lester1 1North Carolina State University, 2University of Central Florida
Pseudocode	No	The paper describes the model architectures using text and figures (Figures 2 and 3) but does not include any pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any statement or link regarding the public availability of its source code.
Open Datasets	No	In this work, we used data from two different samples of students across different contexts (laboratory and classroom) to increase the heterogeneity of the sample and the generalizability of the resulting model (Sawyer et al. 2018). Students from both samples answered the same pre- and post-test surveys, but there were some differences in the experimental setup and game. Combining the data from the university-based laboratory study (n = 62) with the data from the classroom-based study (n = 119), the total sample size of the dataset is 181 students. The paper does not indicate that this dataset is publicly available or provide access information.
Dataset Splits	Yes	All models were trained and evaluated using 10-fold cross-validation along the same set of students to remove noise from sampling differences. In conducting the cross-validation, we ensured that no student data occurred both in the training and test sets. Hyperparameter tuning was conducted for each of the models within the 10-fold cross validation. Continuous data were standardized within each of the folds.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions general terms like "on a GPU" implicitly through the use of neural networks.
Software Dependencies	No	The paper describes the model components and activation functions (e.g., "LSTM", "sigmoid activation function"), but it does not specify any software names with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for reproducibility.
Experiment Setup	Yes	Every model was hyperparameter tuned using a grid search: number of LSTM units (32, 64, 128), number of dense units (32, 64, 128), and dropout rate (.33, .66). The best model was selected using the validation data and reported using the 10-fold test data. All models used early stopping using mean squared error with a patience of 15 and 500 maximum epochs.