reproducibilityindex.ai

Continuous Self-Attention Models with Neural ODE Networks

Authors: Jing Zhang, Peng Zhang, Baiwen Kong, Junqiu Wei, Xin Jiang14393-14401

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We perform a series of experiments on text classiﬁcation, natural language inference (NLI) and text matching tasks.
Researcher Affiliation	Collaboration	Jing Zhang1, Peng Zhang1*, Baiwen Kong1, Junqiu Wei2, Xin Jiang2 1College of Intelligence and Computing, Tianjin University, Tianjin, China 2Huawei Noah s Ark Lab, China
Pseudocode	No	The paper describes the model architecture and components but does not include structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the described methodology.
Open Datasets	Yes	MR (Pang and Lee 2004a): Movie reviews are divided into positive and negative categories; CR (Hu and Liu 2004): Customer reviews set where the task is to predict positive or negative product reviews; SUBJ (Pang and Lee 2004b): Subjectivity dataset where the target is to classify a text as being subjective or objective; MPQA (Wiebe, Wilson, and Cardie 2005): Opinion polarity detection subtask; TREC (Li and Roth 2002): question classiﬁcation dataset which involves classifying a question into 6 question types. ... SNLI (Bowman et al. 2015): Stanford Natural Language Inference is a benchmark dataset for natural language inference. ... Wiki QA (Yang, Yih, and Meek 2015) is a retrievalbased question answering dataset based on Wikipedia
Dataset Splits	No	The paper evaluates on test sets but does not specify the training, validation, and test splits (e.g., percentages or sample counts) for reproducibility, nor does it explicitly reference predefined splits with clear citations.
Hardware Specification	Yes	For all tasks, we implement our model with Pytorch-1.20, and train them on a Nvidia P40 GPU.
Software Dependencies	Yes	For all tasks, we implement our model with Pytorch-1.20
Experiment Setup	Yes	Word embeddings are initialized by Glo Ve (Pennington, Socher, and Manning 2014) with 300-dimension. All other parameters are initialized with Xavier (Glorot and Bengio 2010) and normalized by weight normalization (Salimans and Kingma 2016). As for learning method, we use the Adam optimizer (Kingma and Ba 2014) and an exponentially decaying learning rate with a linear warm up. The dimension of the hidden vectors is set to 300, which is equal to the word embedding size. As for convolution, the ﬁlter size is set to 2. In addition, dropout with a keep probability of 0.1 is applied in the layers. The initial learning rate is set from 0.0001 to 0.003 and the batch size is tuned from 80 to 256. The L2 regularization decay factor is 10 5. In addition, the initial step size of the selfattention ODE solver is tuned from 10 2 to 5 10 1.