reproducibilityindex.ai

Switchable Decision: Dynamic Neural Generation Networks

Authors: Shujian Zhang, Korawat Tanwisuth, Chengyue Gong, Pengcheng He, Mingyuan Zhou

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments across question answering, summarization, and classification benchmarks show that our method benefits from less computation cost during inference while keeping the same accuracy.
Researcher Affiliation	Academia	1The University of Texas at Austin. Correspondence to: Shujian Zhang <szhang19@utexas.edu>.
Pseudocode	Yes	Algorithm 1 Switchable Decision (SD)
Open Source Code	No	The paper uses and cites external libraries like Fairseq and Hugging Face Transformers but does not provide a specific link or explicit statement about the availability of the source code for their proposed 'Switchable Decision' method.
Open Datasets	Yes	Summarization. We use CNN/Daily Mail (Hermann et al., 2015) and XSum (Narayan et al., 2018) to evaluate our method. Question Answering. The Stanford Question Answering Datasets (SQu AD) v1.1 and v2.0 (Rajpurkar et al., 2016; 2018; Fan et al., 2020) are popular machine reading comprehension benchmarks. Classification. The General Language Understanding Evaluation (GLUE) benchmark is a collection of natural language understanding (NLU) tasks. As shown in Table 1, we include Multi-Genre NLI (MNLI; (Williams et al., 2017b; Zhang et al., 2021d)), Recognizing Textual Entailment (RTE; (Dagan et al., 2005)), and Stanford Sentiment Treebank (SST; (Socher et al., 2013)).
Dataset Splits	Yes	Table 1. Dataset Configuration. The top block is for summarization, the middle block is for question answering, and the bottom block is the classification tasks. Summarization CNN/Daily Mail 287.2K 13.4K 11.5k XSum 204K 11.3K 11.3K Question Answering SQu AD 1.1 87.6K 10.5K 9.5k SQu AD 2.0 130.3K 11.9K 8.9K Classification RTE 2.5K 276 3k MNLI 393K 20K 20K SST 67K 872 1.8K
Hardware Specification	Yes	Experiments in this part are performed on eight Tesla V100 GPUs.
Software Dependencies	No	The paper mentions using 'Fairseq library' and 'Hugging Face Transformer library' and the 'Adam optimizer' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	Following Lewis et al. (2019), we take the pre-trained BART model as the backbone and utilize the provided checkpoint for finetuning on the downstream datasets...Specifically, in summarization, we set the training steps as 50k and the number of warm-up steps as 500. The max number of tokens and the update frequency are set to be 2,048 and 4, respectively. The learning rate is set to 3 10 5. For the question answering (SQu AD 1.1/2.0). We set the total number of updates and warm-up updates as 5,430 and 326, respectively. The max number of sentences is 3 per device with an update frequency of 2. The learning rate is 1.5 10 5. We refer the readers to Appendix A for classification hyper-parameter configurations, and more details about the settings. ... Table 15. Experiment setting for MNLI, RTE, and SST-2 (LR: learning rate, BSZ: batch size, NC: number of classes, TS: total number of training steps, WS: warm-up steps).