reproducibilityindex.ai

DenoSent: A Denoising Objective for Self-Supervised Sentence Representation Learning

Authors: Xinghao Wang, Junliang He, Pengyu Wang, Yunhua Zhou, Tianxiang Sun, Xipeng Qiu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical evaluations demonstrate that this approach delivers competitive results on both semantic textual similarity (STS) and a wide range of transfer tasks, standing up well in comparison to contrastive-learning-based methods.
Researcher Affiliation	Academia	School of Computer Science, Fudan University {xinghaowang22, jlhe22, pywang22}@m.fudan.edu.cn, {zhouyh20, txsun19, xpqiu}@fudan.edu.cn
Pseudocode	No	The paper describes the model architecture and processes in textual form and through diagrams, but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Our code is available at https://github.com/xinghaow99/Deno Sent.
Open Datasets	Yes	We use the unsupervised Wiki dataset used in Sim CSE (Gao, Yao, and Chen 2021) as our self-supervised training dataset. For back translation data augmentation, we use pre-trained machine translation models (Tiedemann and Thottingal 2020) to translate the training sentences to Chinese and then translate them back to English.
Dataset Splits	Yes	perform a sweep on these parameters then select the checkpoint that has the highest spearman correlation on the STS-Benchmark development set for evaluation.
Hardware Specification	Yes	We conduct all the experiments on a machine with 8 NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper mentions using 'bert-base-uncased' and 'Adam W' optimizer, and toolkits like 'Sent Eval' and 'MTEB'. However, it does not specify version numbers for general software dependencies like Python, PyTorch, TensorFlow, or the mentioned toolkits.
Experiment Setup	Yes	We use a learning rate of 5e-5 and Adam W (Loshchilov and Hutter 2017) as the optimizer. For the input sequence length, we use a value of 32. For the denoising objective, we use {0.8, 0.825, 0.85, 0.875, 0.9} as the dropout rates for continuous perturbations, {12, 14, 16} as the number of decoder transformer layers and perform a sweep on these parameters then select the checkpoint that has the highest spearman correlation on the STS-Benchmark development set for evaluation. We use 0.825 as the dropout rate and 16 transformer layers for reported results. For the contrastive objective, we use a temperature τ = 0.03.