Semantic Re-tuning with Contrastive Tension

Authors: Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylipää Hellqvist, Magnus Sahlgren

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Results from multiple common unsupervised and supervised STS tasks indicate that CT outperforms previous State Of The Art (SOTA), and when combining CT with supervised data we improve upon previous SOTA results with large margins.
Researcher Affiliation Academia Fredrik Carlsson Evangelia Gogoulou Erik Ylip a a Amaru Cuba Gyllensten Magnus Sahlgren RISE NLU Group {firstname.lastname}@ri.se
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code and models is available at Github.com/Fredde Frallan/Contrastive-Tension
Open Datasets Yes Training data is randomly sampled from English Wikipedia (See Appendix C.2), where we collect K = 7 negative sentence pairs for each positive sentence pair. (Table 12: English https://dumps.wikimedia.org/enwiki/20200820/enwiki-20200820-pages-articles-multistream.xml.bz2)
Dataset Splits Yes These sentence embeddings are directly evaluated towards the STS-b test (Cer et al., 2017), without any additional training, from which we report the Spearman correlation between the cosine similarity of the embeddings and the manually collected similarity scores. The test partition of the dataset contains 1,379 sentence pairs... (Also, "Table 2 shows the test results of the model that performed best on the validation set.")
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using the "Huggingface API" and "Sent Eval package" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Unless stated otherwise, the following set of hyperparameters is applied when using CT throughout all experiments: Training data is randomly sampled from English Wikipedia (See Appendix C.2), where we collect K = 7 negative sentence pairs for each positive sentence pair. The batch size is set to 16, which results in every batch having 2 positive sentence pairs and 14 negative sentence pairs. We apply an RMSProp optimizer (Hinton, 2012) with a fixed learning rate schedule that decreases from 1e 5 to 2e 6 (Appendix A.3).