Dealbreaker: A Nonlinear Latent Variable Model for Educational Data

Authors: Andrew Lan, Tom Goldstein, Richard Baraniuk, Christoph Studer

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that the dealbreaker model achieves comparable or better prediction performance as compared to affine models with real-world educational datasets. We now demonstrate the prediction performance of the dealbreaker model on unobserved student responses using four real-world educational datasets. We furthermore showcase the interpretability of the dealbreaker model by visualizing the dealbreaker concept for each question.
Researcher Affiliation Academia Andrew Lan SL29@RICE.EDU Rice University, Tom Goldstein TOMG@CS.UMD.EDU University of Maryland, Richard Baraniuk RICHB@RICE.EDU Rice University, Christoph Studer STUDER@CORNELL.EDU Cornell University
Pseudocode No No clearly labeled pseudocode or algorithm blocks were found. The paper describes algorithmic steps and mathematical formulations for inference, but not in a pseudocode format.
Open Source Code No The paper does not provide an explicit statement or link confirming the release of its own source code for the described methodology.
Open Datasets Yes MT: N = 99 students answering Q = 34 questions in a high-school algebra test administered in Amazon s Mechanical Turk (Amazon, 2016)... UG: N = 92 students answering Q = 203 questions... CE: N = 1567 students answering Q = 60 questions... ed X: N = 6403 students answering Q = 197 questions... Movie Lens 100k dataset (Herlocker et al., 1999)
Dataset Splits No To reduce the identifiability issue of the dealbreaker model, we add the regularization term λ 2 (P k,j C2 k,j + P i,k µ2 i,k) to the cost functions of both the hard and soft dealbreaker optimization problems and select the parameter λ using cross-validation. In each crossvalidation run, we randomly leave out 20% of the student responses in the dataset (the unobserved data) and train the algorithms on the rest of the responses before testing their prediction performance on the unobserved data.
Hardware Specification Yes For example, a single run of our Python code for the soft dealbreaker model with the UG dataset with 92 students and 203 questions takes only 10 s compared to 30 s for the hard dealbreaker model on an Intel i7 laptop with a 2.8 GHz CPU and 8 GB memory.
Software Dependencies No The paper mentions 'our Python code' and that 'For the Rasch model and the MIRT model, we perform inference using the R MIRT package (Chalmers, 2012).' However, specific version numbers for Python or the R MIRT package itself are not provided.
Experiment Setup Yes To reduce the identifiability issue of the dealbreaker model, we add the regularization term λ 2 (P k,j C2 k,j + P i,k µ2 i,k) to the cost functions of both the hard and soft dealbreaker optimization problems and select the parameter λ using cross-validation. For the MIRT model, the DINA model, and both dealbreaker models, we use K {3, 6} concepts. We randomly initialize the variables Zk i,j, Ck,j, and µi,k, i, j, k from the standard normal distribution, and initialize the Lagrange multipliers as Λk i,j = 0, i, j, k.