reproducibilityindex.ai

Generalize Sentence Representation with Self-Inference

Authors: Kai-Chou Yang, Hung-Yu Kao9394-9401

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method on four benchmarks among three NLP tasks. Experimental results demonstrate that our model sets a new state-of-the-art on Multi NLI, Scitail and is competitive on the remaining two datasets over all sentence encoding methods.
Researcher Affiliation	Academia	National Cheng Kung University Tainan, Taiwan
Pseudocode	No	The paper describes the model architecture and equations but does not include pseudocode or an algorithm block.
Open Source Code	No	The paper does not provide any link or explicit statement about releasing open-source code for the described methodology.
Open Datasets	Yes	We evaluate our model on four widely-studied benchmark datasets among three NLP tasks: natural language inference, text classiﬁcation and sentiment classiﬁcation. ... Multi NLI (Williams, Nangia, and Bowman 2017), and Scitail (Khot, Sabharwal, and Clark 2018). ... AG News (Zhang, Zhao, and Le Cun 2015)... SST (Socher et al. 2013)
Dataset Splits	Yes	For text classiﬁcation, we use AG News (Zhang, Zhao, and Le Cun 2015)... Since there is no ofﬁcial validation set for this dataset, we split 5% training data for early-stopping and hyper-parameter searching.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions software components and tools like FastText, GELU, and Adam, but does not specify their version numbers (e.g., 'Fast Text common-crawl vectors (Mikolov et al. 2018)', 'GELU (Hendrycks and Gimpel 2016)', 'Adam(Kingma and Ba 2014)').
Experiment Setup	Yes	We initialize word embeddings using the pretrained Fast Text common-crawl vectors (Mikolov et al. 2018) and freeze the weights during training. The character embedding is composed of (50, 55, 60, 65, 70) ﬁlters with kernel sizes (2, 3, 4, 5, 6) followed by max pooling. The CNN base encoder has k/4 ﬁlters with sizes (2, 3, 4, 5), respectively. The classiﬁer is a neural network of two-hidden layers followed by batch normalization and dropout layers. The objective function is cross-entropy loss which is optimized by Adam(Kingma and Ba 2014) with cyclic learning rate(Smith 2015). We applied label smoothing (Szegedy et al. 2015) to penalize conﬁdent predictions for NLI datasets and regularize the models with the L2 penalty. We search for the optimal hyper-parameters for each task by grid search.