DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning

Authors: Xinghao Wang, Junliang He, Pengyu Wang, Yunhua Zhou, Tianxiang Sun, Xipeng Qiu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks, standing up well in comparison to contrastive-learning-based methods.
Researcher Affiliation Academia School of Computer Science, Fudan University {xinghaowang22, jlhe22, pywang22}@m.fudan.edu.cn, {zhouyh20, txsun19, xpqiu}@fudan.edu.cn
Pseudocode No The paper describes the model architecture and processes in textual form and through diagrams, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code is available at https://github.com/xinghaow99/Deno Sent.
Open Datasets Yes We use the unsupervised Wiki dataset used in Sim CSE (Gao, Yao, and Chen 2021) as our self-supervised training dataset. For back translation data augmentation, we use pre-trained machine translation models (Tiedemann and Thottingal 2020) to translate the training sentences to Chinese and then translate them back to English.
Dataset Splits Yes perform a sweep on these parameters then select the checkpoint that has the highest spearman correlation on the STS-Benchmark development set for evaluation.
Hardware Specification Yes We conduct all the experiments on a machine with 8 NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies No The paper mentions using 'bert-base-uncased' and 'Adam W' optimizer, and toolkits like 'Sent Eval' and 'MTEB'. However, it does not specify version numbers for general software dependencies like Python, PyTorch, TensorFlow, or the mentioned toolkits.
Experiment Setup Yes We use a learning rate of 5e-5 and Adam W (Loshchilov and Hutter 2017) as the optimizer. For the input sequence length, we use a value of 32. For the denoising objective, we use {0.8, 0.825, 0.85, 0.875, 0.9} as the dropout rates for continuous perturbations, {12, 14, 16} as the number of decoder transformer layers and perform a sweep on these parameters then select the checkpoint that has the highest spearman correlation on the STS-Benchmark development set for evaluation. We use 0.825 as the dropout rate and 16 transformer layers for reported results. For the contrastive objective, we use a temperature τ = 0.03.