reproducibilityindex.ai

Sublinear Time Approximation of Text Similarity Matrices

Authors: Archan Ray, Nicholas Monath, Andrew McCallum, Cameron Musco8072-8080

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that our method, along with a simple variant of CUR decomposition, performs very well in approximating a variety of similarity matrices arising in NLP tasks. We demonstrate high accuracy of the approximated similarity matrices in tasks of document classification, sentence similarity, and cross-document coreference.
Researcher Affiliation	Collaboration	University of Massachusetts Amherst {ray, nmonath, mccallum, cmusco}@cs.umass.edu Now at Google.
Pseudocode	Yes	Algorithm 1: Submatrix-Shifted Nyström (SMS-Nyström)
Open Source Code	No	The paper refers to a 'full version' on arXiv (Ray et al. 2021), but this is a paper reference, not a link to source code. No other concrete access to source code is provided.
Open Datasets	Yes	We evaluate the performance of our embeddings in multi-class classification for four different corpora drawn from (Huang et al. 2016; Kusner et al. 2015) Twitter (2176 train, 932 test), Recipe-L (27841 train, 11933 test), Ohsumed (3999 train, 5153 test), and 20News (11293 train, 7528 test). We consider three GLUE benchmark datasets STS-B, MRPC, and RTE. For each task, we first train the BERT model on the test set, using code from (Wolf et al. 2019). We compute the full BERT similarity matrix for all sentences in the validation set, which consists of a set of sentence pairs, each with a true score, derived from human judgements. We evaluate the approximation error and the downstream task performance (CoNLL F1 (Pradhan et al. 2014)) of approximating the symmetrized similarity matrix of the model on the Event Coref Bank+ Corpus (Cybulska and Vossen 2014).
Dataset Splits	Yes	We evaluate the performance of our embeddings in multi-class classification for four different corpora drawn from (Huang et al. 2016; Kusner et al. 2015) Twitter (2176 train, 932 test), Recipe-L (27841 train, 11933 test), Ohsumed (3999 train, 5153 test), and 20News (11293 train, 7528 test). We compute the full BERT similarity matrix for all sentences in the validation set, which consists of a set of sentence pairs, each with a true score, derived from human judgements.
Hardware Specification	No	The paper mentions 'high performance computing equipment obtained under a grant from the Collaborative R&D Fund managed by the Massachusetts Technology Collaborative' in the acknowledgements, but it does not specify any particular CPU, GPU models, or memory details.
Software Dependencies	No	The paper mentions BERT, PyTorch (implicitly by using code from Wolf et al. 2019 for BERT), and cross-encoder models, but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	We typically use α = 1.5. To further insure this, we can multiply e by a small constant factor α > 1 (we typically use α = 1.5) before applying the shift. For SMS-Nyström, we write K = ZZT and use Z as document embeddings (see Alg. 1). We evaluate the performance of our embeddings in multi-class classification for four different corpora... at several sample sizes s. We first train the BERT model on the test set, using code from (Wolf et al. 2019).