reproducibilityindex.ai

Eliciting Thinking Hierarchy without a Prior

Authors: Yuqing Kong, Yunqi Li, Yubo Zhang, Zhihuan Huang, Jinzhao Wu

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In addition to theoretic justifications, we conduct four empirical crowdsourcing studies and show that a) the accuracy of the top-ranking answers learned by our approach is much higher than that of plurality voting... b) our model has a high goodness-of-fit, especially for the questions where our top-ranking answer is correct. More empirical results will be illustrated in Section 3.1. We show the superiority of our algorithm by comparing our algorithm to plurality voting by the accuracy of the top-ranking answers. We also test the goodness-of-fit of our model based on the collected data set.
Researcher Affiliation	Academia	Yuqing Kong CFCS and School of Computer Science Peking University yuqing.kong@pku.edu.cn Yunqi Li CFCS and School of EECS Peking University liyunqi@pku.edu.cn Yubo Zhang CFCS and School of Computer Science Peking University falsyta@pku.edu.cn Zhihuan Huang CFCS and School of Computer Science Peking University zhihuan.huang@pku.edu.cn Jinzhao Wu CFCS and School of EECS Peking University jinzhao.wu@pku.edu.cn
Pseudocode	Yes	Pseudo-codes are attached in Appendix B.
Open Source Code	No	The main text of the paper does not contain an explicit statement of open-source code release, nor does it provide a direct link to a code repository for the described methodology. While the author checklist states 'Yes' to including code for reproduction, this is meta-information about the paper and not a direct statement within the paper's body or a specific link.
Open Datasets	No	We conduct four studies, study 1 (35 math problems), study 2 (30 Go problems), study 3 (44 general knowledge questions), and study 4 (43 Chinese character pronunciation questions). ... Data collection All studies are performed by online questionnaires. We recruit the respondents by an online announcement or from an online platform that is similar to Amazon Mechanical Turk. ... The paper describes collecting its own data and does not mention using or providing access to a publicly available or open dataset.
Dataset Splits	No	The paper describes data collection and processing, including merging answers and omitting low-frequency answers. However, it does not specify any explicit training, validation, or test dataset splits for model reproduction or evaluation.
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as CPU or GPU models, memory, or cloud computing resources. The author checklist also indicates 'N/A' for compute resources.
Software Dependencies	No	The paper discusses algorithms used ('dynamic programming based algorithm', 'NCT based answer-ranking algorithms') but does not specify any particular software, libraries, or their version numbers that would be needed to replicate the experimental setup.
Experiment Setup	No	The paper describes the proposed model and algorithms, as well as data collection and processing. However, it does not provide specific experimental setup details such as hyperparameters, learning rates, batch sizes, optimizers, or other system-level training settings. The author checklist also indicates 'N/A' for training details and hyperparameters.