reproducibilityindex.ai

Semantic Re-tuning with Contrastive Tension

Authors: Fredrik Carlsson, Amaru Cuba Gyllensten, Evangelia Gogoulou, Erik Ylipää Hellqvist, Magnus Sahlgren

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results from multiple common unsupervised and supervised STS tasks indicate that CT outperforms previous State Of The Art (SOTA), and when combining CT with supervised data we improve upon previous SOTA results with large margins.
Researcher Affiliation	Academia	Fredrik Carlsson Evangelia Gogoulou Erik Ylip a a Amaru Cuba Gyllensten Magnus Sahlgren RISE NLU Group {firstname.lastname}@ri.se
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and models is available at Github.com/Fredde Frallan/Contrastive-Tension
Open Datasets	Yes	Training data is randomly sampled from English Wikipedia (See Appendix C.2), where we collect K = 7 negative sentence pairs for each positive sentence pair. (Table 12: English https://dumps.wikimedia.org/enwiki/20200820/enwiki-20200820-pages-articles-multistream.xml.bz2)
Dataset Splits	Yes	These sentence embeddings are directly evaluated towards the STS-b test (Cer et al., 2017), without any additional training, from which we report the Spearman correlation between the cosine similarity of the embeddings and the manually collected similarity scores. The test partition of the dataset contains 1,379 sentence pairs... (Also, "Table 2 shows the test results of the model that performed best on the validation set.")
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using the "Huggingface API" and "Sent Eval package" but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Unless stated otherwise, the following set of hyperparameters is applied when using CT throughout all experiments: Training data is randomly sampled from English Wikipedia (See Appendix C.2), where we collect K = 7 negative sentence pairs for each positive sentence pair. The batch size is set to 16, which results in every batch having 2 positive sentence pairs and 14 negative sentence pairs. We apply an RMSProp optimizer (Hinton, 2012) with a ﬁxed learning rate schedule that decreases from 1e 5 to 2e 6 (Appendix A.3).