Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
A Closed-Form Solution for Fast and Reliable Adaptive Testing
Authors: Yan Zhuang, Chenye Ke, Zirui Liu, Qi Liu, Yuting Ning, Zhenya Huang, Weizhe Huang, Qingyang Mao, Shijin Wang
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on large-scale educational datasets demonstrate that it reduces the number of required questions by 10% compared to SOTA methods, while maintaining the same estimation accuracy. |
| Researcher Affiliation | Collaboration | Yan Zhuang1, Chenye Ke2, Zirui Liu1, Qi Liu1,3 , Yuting Ning4, Zhenya Huang1,3, Weizhe Huang1, Qingyang Mao1, Shijin Wang5 1: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2: Anhui University 3: Institute of Artificial Intelligence, Hefei Comprehensive National Science Center 4: Ohio State University 5: i FLYTEK Co., Ltd |
| Pseudocode | Yes | Algorithm 1: The proposed framework CFAT Algorithm 2: Full Procedure of CFAT Algorithm 3: Full Procedure of CFAT (Approximate) |
| Open Source Code | Yes | The implementation code is available on: https://github.com/54zy/CFAT. |
| Open Datasets | Yes | We conduct experiments on three widely used educational testing benchmark datasets: ASSIST, NIPS-EDU, and EXAM: ASSIST [33] is derived from the online educational platform ASSISTments and contains examinees practice logs on mathematics; Neur IPS-EDU [34] originates from the Neur IPS 2020 Education Challenge, comprising a large-scale dataset collected from examinees responses to questions on Eedi, an educational platform. EXAM is a dataset from i FLYTEK Co., Ltd. that records junior high school students performances on mathematical exams. |
| Dataset Splits | Yes | Following [9, 1], we split examinees into 70% training, 20% validation, and 10% testing. |
| Hardware Specification | Yes | All methods are implemented in Py Torch and trained on a Tesla V100 GPU. |
| Software Dependencies | No | All methods are implemented in Py Torch and trained on a Tesla V100 GPU. Hyperparameters are tuned via grid search, with batch size 64, learning rate 0.001, and behavioral noise parameters πg = 0.002, πs = 0.001. Optimization is performed using Adam. |
| Experiment Setup | Yes | Hyperparameters are tuned via grid search, with batch size 64, learning rate 0.001, and behavioral noise parameters πg = 0.002, πs = 0.001. Optimization is performed using Adam. |