DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning
Authors: Xinghao Wang, Junliang He, Pengyu Wang, Yunhua Zhou, Tianxiang Sun, Xipeng Qiu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks, standing up well in comparison to contrastive-learning-based methods. |
| Researcher Affiliation | Academia | School of Computer Science, Fudan University {xinghaowang22, jlhe22, pywang22}@m.fudan.edu.cn, {zhouyh20, txsun19, xpqiu}@fudan.edu.cn |
| Pseudocode | No | The paper describes the model architecture and processes in textual form and through diagrams, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/xinghaow99/Deno Sent. |
| Open Datasets | Yes | We use the unsupervised Wiki dataset used in Sim CSE (Gao, Yao, and Chen 2021) as our self-supervised training dataset. For back translation data augmentation, we use pre-trained machine translation models (Tiedemann and Thottingal 2020) to translate the training sentences to Chinese and then translate them back to English. |
| Dataset Splits | Yes | perform a sweep on these parameters then select the checkpoint that has the highest spearman correlation on the STS-Benchmark development set for evaluation. |
| Hardware Specification | Yes | We conduct all the experiments on a machine with 8 NVIDIA Ge Force RTX 3090 GPUs. |
| Software Dependencies | No | The paper mentions using 'bert-base-uncased' and 'Adam W' optimizer, and toolkits like 'Sent Eval' and 'MTEB'. However, it does not specify version numbers for general software dependencies like Python, PyTorch, TensorFlow, or the mentioned toolkits. |
| Experiment Setup | Yes | We use a learning rate of 5e-5 and Adam W (Loshchilov and Hutter 2017) as the optimizer. For the input sequence length, we use a value of 32. For the denoising objective, we use {0.8, 0.825, 0.85, 0.875, 0.9} as the dropout rates for continuous perturbations, {12, 14, 16} as the number of decoder transformer layers and perform a sweep on these parameters then select the checkpoint that has the highest spearman correlation on the STS-Benchmark development set for evaluation. We use 0.825 as the dropout rate and 16 transformer layers for reported results. For the contrastive objective, we use a temperature τ = 0.03. |