Off-Policy Selection for Initiating Human-Centric Experimental Design
Authors: Ge Gao, Xi Yang, Qitong Gao, Song Ju, Miroslav Pajic, Min Chi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | FPS is evaluated via two important but challenging applications, intelligent tutoring systems and a healthcare application for sepsis treatment and intervention. FPS presents significant advancement in enhancing learning outcomes of students and in-hospital care outcomes. |
| Researcher Affiliation | Collaboration | Stanford University. The work was done at North Carolina State University. Contact: gegao@stanford.edu, mchi@ncsu.edu. IBM Research Duke University North Carolina State University |
| Pseudocode | Yes | Algorithm 1 FPS. Require: A set of target policies Π, offline dataset D. |
| Open Source Code | No | The paper states in the NeurIPS Paper Checklist that 'The authors provided the details of the code as part of their submissions via structured templates' and that 'Real-world human data is not publicly released under IRB protocols.' It does not provide a direct link or an unambiguous public release statement for the code of their described methodology. |
| Open Datasets | No | The paper uses data from 'a real-world IE system' involving '1,288 student participating over 5 years' and a 'simulated environment' for sepsis treatment based on [48]. For the real-world data, the NeurIPS Paper Checklist states 'Real-world human data is not publicly released under IRB protocols.'. For the simulated environment, it cites a paper but does not provide a direct link or specific access details to the dataset itself. |
| Dataset Splits | No | The paper describes using data from 'the first 5 semesters' for training and 'the 6-th semester' for testing in the IE experiment, providing 'Ntrain' and 'Ntest' sample counts. However, it does not explicitly specify a separate validation dataset or its split percentages/counts. |
| Hardware Specification | Yes | All experimental workloads are distributed among 4 Nvidia RTX A5000 24GB, 3 Nvidia Quadro RTX 6000 24GB, and 4 NVIDIA TITAN Xp 12GB graphics cards. |
| Software Dependencies | No | The paper mentions using specific algorithms and tools (e.g., 'FQE', 'Dual DICE', 'MAGIC', 'Adam optimizer', 'DQN-based algorithm') but does not specify software versions for programming languages, libraries, or frameworks used (e.g., Python, PyTorch/TensorFlow, scikit-learn versions). |
| Experiment Setup | Yes | For training, in subgroups with sample size greater than 200, the maximum number of iteration is set to 1000 and minibatch size set to 64, and 200 and 4 respectively for subgroups with sample size less than or equal to 200. Adam optimizer is used to perform gradient descent. To determine the learning rate, we perform grid search among {1e 4, 3e 3, 3e 4, 5e 4, 7e 4}. Exponential decay is applied to the learning rate, which decays the learning rate by 0.997 every iteration. |