reproducibilityindex.ai

Block-Skim: Efficient Question Answering for Transformer

Authors: Yue Guan, Zhengyi Li, Zhouhan Lin, Yuhao Zhu, Jingwen Leng, Minyi Guo10710-10719

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Evaluation Experimental Setup, We evaluate our method on 6 extractive QA datasets, including SQuAD 1.1... Table 1 shows the result on multiple QA datasets.
Researcher Affiliation	Academia	Yue Guan1,2, Zhengyi Li1,2, Zhouhan Lin1, Yuhao Zhu3, Jingwen Leng1,2, Minyi Guo1,2, 1 Shanghai Jiao Tong University 2 Shanghai Qi Zhi Institute 3 University of Rochester
Pseudocode	No	The paper describes its method using prose and diagrams (Figure 1, Figure 4) but does not include any formal pseudocode or algorithm blocks.
Open Source Code	Yes	We implement the proposed method based on open-sourced library from Wolf et al. (2019)1. ... 1The source code is available at https://github.com/ChandlerGuan/blockskim.
Open Datasets	Yes	We evaluate our method on 6 extractive QA datasets, including SQuAD 1.1 (Rajpurkar et al. 2016), Natural Questions (Kwiatkowski et al. 2019), Trivia QA (Joshi et al. 2017), News QA (Trischler et al. 2016), Search QA(Dunn et al. 2017) and Hotpot QA (Yang et al. 2018).
Dataset Splits	Yes	We evaluate our method on 6 extractive QA datasets, including SQuAD 1.1 (Rajpurkar et al. 2016)... The attention heatmaps are proﬁled on the development set of SQu AD dataset with a BERTbase model...
Hardware Specification	Yes	We use four V100 GPUs with 32 GB memory for the training experiments.
Software Dependencies	No	The paper mentions using an 'open-sourced library from Wolf et al. (2019)' (Hugging Face Transformers) and 'Torch Profile(Liu 2020)', but it does not specify explicit version numbers for these software dependencies or other libraries.
Experiment Setup	Yes	We initialize the learning rate to 3e 5 for BERT models and 5e 5 for ALBERT with a linear learning rate scheduler. For SQuAD dataset, we apply batch size 16 and maximum sequence length 384. And for the other datasets, we apply batch size 32 and maximum sequence length 512. We perform all the experiments reported with random seed 42. ... The balance factor β is determined by block sample numbers and reported in Tbl. 1. The harmony factor α is 0.01 for ALBERT and 0.1 for all the other models we used. It is determined by hyper-parameter grid search from 1e 3 to 10 with a step of 10.