Generalize Sentence Representation with Self-Inference

Authors: Kai-Chou Yang, Hung-Yu Kao9394-9401

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed method on four benchmarks among three NLP tasks. Experimental results demonstrate that our model sets a new state-of-the-art on Multi NLI, Scitail and is competitive on the remaining two datasets over all sentence encoding methods.
Researcher Affiliation Academia National Cheng Kung University Tainan, Taiwan
Pseudocode No The paper describes the model architecture and equations but does not include pseudocode or an algorithm block.
Open Source Code No The paper does not provide any link or explicit statement about releasing open-source code for the described methodology.
Open Datasets Yes We evaluate our model on four widely-studied benchmark datasets among three NLP tasks: natural language inference, text classification and sentiment classification. ... Multi NLI (Williams, Nangia, and Bowman 2017), and Scitail (Khot, Sabharwal, and Clark 2018). ... AG News (Zhang, Zhao, and Le Cun 2015)... SST (Socher et al. 2013)
Dataset Splits Yes For text classification, we use AG News (Zhang, Zhao, and Le Cun 2015)... Since there is no official validation set for this dataset, we split 5% training data for early-stopping and hyper-parameter searching.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions software components and tools like FastText, GELU, and Adam, but does not specify their version numbers (e.g., 'Fast Text common-crawl vectors (Mikolov et al. 2018)', 'GELU (Hendrycks and Gimpel 2016)', 'Adam(Kingma and Ba 2014)').
Experiment Setup Yes We initialize word embeddings using the pretrained Fast Text common-crawl vectors (Mikolov et al. 2018) and freeze the weights during training. The character embedding is composed of (50, 55, 60, 65, 70) filters with kernel sizes (2, 3, 4, 5, 6) followed by max pooling. The CNN base encoder has k/4 filters with sizes (2, 3, 4, 5), respectively. The classifier is a neural network of two-hidden layers followed by batch normalization and dropout layers. The objective function is cross-entropy loss which is optimized by Adam(Kingma and Ba 2014) with cyclic learning rate(Smith 2015). We applied label smoothing (Szegedy et al. 2015) to penalize confident predictions for NLI datasets and regularize the models with the L2 penalty. We search for the optimal hyper-parameters for each task by grid search.