reproducibilityindex.ai

Learning to Select from Multiple Options

Authors: Jiangshu Du, Wenpeng Yin, Congying Xia, Philip S. Yu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our methods are evaluated on three tasks (ultrafine entity typing, intent detection and multi-choice QA) that are typical selection problems with different sizes of options. Experiments show our models set new SOTA performance; particularly, Parallel-TE is faster than the pairwise TE by k times in inference.
Researcher Affiliation	Collaboration	Jiangshu Du1, Wenpeng Yin2, Congying Xia3, Philip S. Yu1 1University of Illinois at Chicago, Chicago, IL, USA 2Penn State University, State College, PA, USA 3Salesforce Research, Palo Alto, CA, USA
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	No	The paper does not provide an explicit statement or link for the release of open-source code for the described methodology.
Open Datasets	Yes	Our experiments are conducted on three different tasks: ultrafine entity typing, few-shot intent detection and multiple-choice QA. We choose the three tasks since they represent different selection problems in NLP. Ultra-fine entity typing is a multi-label task with a large option space: over 10,000 entity types. The few-shot intent detection task evaluates our proposed models under few-shot selection scene. Multiple-choice QA is a selection problem which requires the model to understand long paragraphs.
Dataset Splits	Yes	The annotated examples are equally split into train, dev and test. From the training data, we randomly sample 5-shot and 10-shot instances per intent as our train respectively. We also sample a small portion of the training dataset as our dev, following the previous setting (Zhang et al. 2021; Mehri, Eric, and Hakkani-Tür 2020).
Hardware Specification	Yes	Experiments run at an NVIDIA TITAN RTX. The inference speed is measured on an NVIDIA Ge Force RTX 3090 with the evaluation batch size of 256.
Software Dependencies	No	The paper mentions using Ro BERTa, BERT, and Sentence-BERT models but does not provide specific version numbers for underlying software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	The hyperparameters, threshold τ, and k are searched on the dev set for each task. The inference speed is measured on an NVIDIA Ge Force RTX 3090 with the evaluation batch size of 256. We train both models 5 epochs and report the test accuracy at the end of each epoch.