Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Closed-Form Solution for Fast and Reliable Adaptive Testing

Authors: Yan Zhuang, Chenye Ke, Zirui Liu, Qi Liu, Yuting Ning, Zhenya Huang, Weizhe Huang, Qingyang Mao, Shijin Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on large-scale educational datasets demonstrate that it reduces the number of required questions by 10% compared to SOTA methods, while maintaining the same estimation accuracy.
Researcher Affiliation	Collaboration	Yan Zhuang1, Chenye Ke2, Zirui Liu1, Qi Liu1,3 , Yuting Ning4, Zhenya Huang1,3, Weizhe Huang1, Qingyang Mao1, Shijin Wang5 1: State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China 2: Anhui University 3: Institute of Artiﬁcial Intelligence, Hefei Comprehensive National Science Center 4: Ohio State University 5: i FLYTEK Co., Ltd
Pseudocode	Yes	Algorithm 1: The proposed framework CFAT Algorithm 2: Full Procedure of CFAT Algorithm 3: Full Procedure of CFAT (Approximate)
Open Source Code	Yes	The implementation code is available on: https://github.com/54zy/CFAT.
Open Datasets	Yes	We conduct experiments on three widely used educational testing benchmark datasets: ASSIST, NIPS-EDU, and EXAM: ASSIST [33] is derived from the online educational platform ASSISTments and contains examinees practice logs on mathematics; Neur IPS-EDU [34] originates from the Neur IPS 2020 Education Challenge, comprising a large-scale dataset collected from examinees responses to questions on Eedi, an educational platform. EXAM is a dataset from i FLYTEK Co., Ltd. that records junior high school students performances on mathematical exams.
Dataset Splits	Yes	Following [9, 1], we split examinees into 70% training, 20% validation, and 10% testing.
Hardware Specification	Yes	All methods are implemented in Py Torch and trained on a Tesla V100 GPU.
Software Dependencies	No	All methods are implemented in Py Torch and trained on a Tesla V100 GPU. Hyperparameters are tuned via grid search, with batch size 64, learning rate 0.001, and behavioral noise parameters πg = 0.002, πs = 0.001. Optimization is performed using Adam.
Experiment Setup	Yes	Hyperparameters are tuned via grid search, with batch size 64, learning rate 0.001, and behavioral noise parameters πg = 0.002, πs = 0.001. Optimization is performed using Adam.