Off-Policy Selection for Initiating Human-Centric Experimental Design

Authors: Ge Gao, Xi Yang, Qitong Gao, Song Ju, Miroslav Pajic, Min Chi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental FPS is evaluated via two important but challenging applications, intelligent tutoring systems and a healthcare application for sepsis treatment and intervention. FPS presents significant advancement in enhancing learning outcomes of students and in-hospital care outcomes.
Researcher Affiliation Collaboration Stanford University. The work was done at North Carolina State University. Contact: gegao@stanford.edu, mchi@ncsu.edu. IBM Research Duke University North Carolina State University
Pseudocode Yes Algorithm 1 FPS. Require: A set of target policies Π, offline dataset D.
Open Source Code No The paper states in the NeurIPS Paper Checklist that 'The authors provided the details of the code as part of their submissions via structured templates' and that 'Real-world human data is not publicly released under IRB protocols.' It does not provide a direct link or an unambiguous public release statement for the code of their described methodology.
Open Datasets No The paper uses data from 'a real-world IE system' involving '1,288 student participating over 5 years' and a 'simulated environment' for sepsis treatment based on [48]. For the real-world data, the NeurIPS Paper Checklist states 'Real-world human data is not publicly released under IRB protocols.'. For the simulated environment, it cites a paper but does not provide a direct link or specific access details to the dataset itself.
Dataset Splits No The paper describes using data from 'the first 5 semesters' for training and 'the 6-th semester' for testing in the IE experiment, providing 'Ntrain' and 'Ntest' sample counts. However, it does not explicitly specify a separate validation dataset or its split percentages/counts.
Hardware Specification Yes All experimental workloads are distributed among 4 Nvidia RTX A5000 24GB, 3 Nvidia Quadro RTX 6000 24GB, and 4 NVIDIA TITAN Xp 12GB graphics cards.
Software Dependencies No The paper mentions using specific algorithms and tools (e.g., 'FQE', 'Dual DICE', 'MAGIC', 'Adam optimizer', 'DQN-based algorithm') but does not specify software versions for programming languages, libraries, or frameworks used (e.g., Python, PyTorch/TensorFlow, scikit-learn versions).
Experiment Setup Yes For training, in subgroups with sample size greater than 200, the maximum number of iteration is set to 1000 and minibatch size set to 64, and 200 and 4 respectively for subgroups with sample size less than or equal to 200. Adam optimizer is used to perform gradient descent. To determine the learning rate, we perform grid search among {1e 4, 3e 3, 3e 4, 5e 4, 7e 4}. Exponential decay is applied to the learning rate, which decays the learning rate by 0.997 every iteration.