reproducibilityindex.ai

Get a Head Start: On-Demand Pedagogical Policy Selection in Intelligent Tutoring

Authors: Ge Gao, Xi Yang, Min Chi

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on the Probability ITS that has been used in real classrooms for over eight years. Our study shows significant improvement on learning outcomes of students with EDUPLANNER, especially for the ones associated with low-performing subgroups.
Researcher Affiliation	Collaboration	Ge Gao1, Xi Yang2, Min Chi1 1Department of Computer Science, North Carolina State University 2IBM Research ggao5@ncsu.edu, xi.yang@ibm.com, mchi@ncsu.edu
Pseudocode	Yes	Algorithm 1: EDUPLANNER.
Open Source Code	No	The paper states 'For Dual DICE, we use the open-sourced code in its original paper.' This refers to a third-party tool's code, not the code for the methodology described in this paper (EDUPLANNER). There is no explicit statement or link indicating that the authors' own code is open-source.
Open Datasets	No	The paper describes using '459k historical logs from 1, 148 students across five years' and mentions 'the Probability ITS... has been extensively used by over 2, 000 students with 800k recorded interaction logs through eight academic years.' However, it does not provide a specific link, DOI, repository name, or formal citation (with authors and year) for public access to this dataset.
Dataset Splits	Yes	In this study, we used 459k historical logs from 1, 148 students across five years for offline policy training and evaluation, and targeted the following semester using EDUPLANNER with the major goal to improve learning outcomes. ... In total, 140 students accomplished all procedures of the study. ... EDUPLANNER identified four subgroups (i.e., K1, K2, K3, K4) using historical data from the Probability ITS. Specifically, K1(Nhis = 345, Ntest = 30) and K2(Nhis = 678, Ntest = 92) are majority groups... K3(Nhis = 101, Ntest = 12) and K4(Nhis = 24, Ntest = 6), contained less samples...
Hardware Specification	No	The paper does not provide specific details about the hardware used for running its experiments, such as GPU models, CPU types, or specific cloud computing instances.
Software Dependencies	No	The paper mentions using specific tools like Dual DICE and MAGIC, but does not provide version numbers for any software dependencies, such as programming languages or libraries used in its implementation (e.g., 'For Dual DICE, we use the open-sourced code in its original paper.' and 'For MAGIC, we use the implementation of (Voloshin et al. 2021).').
Experiment Setup	No	The paper describes the action space size (3) and the number of repetitions for RRS (20 times), but it does not provide specific hyperparameters for model training, such as learning rates, batch sizes, number of epochs, or optimizer settings for the neural networks mentioned (e.g., in FQE).