Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings

Authors: Yiren Jian, Chongyang Gao, Soroush Vosoughi

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on 7 semantic textual similarity benchmarks reveal that models trained with the additional non-linguistic (images/audio) contrastive objective lead to higher quality sentence embeddings.
Researcher Affiliation Academia Yiren Jian Department of Computer Science Dartmouth College EMAIL Chongyang Gao Department of Computer Science Northwestern University EMAIL Soroush Vosoughi Department of Computer Science Dartmouth College EMAIL
Pseudocode Yes We provide the pseudo-code of our algorithm Visual CSE in the style of Py Torch in Algorithm 1.
Open Source Code Yes The code is available at https://github.com/yiren-jian/Non Ling-CSE.
Open Datasets Yes For learning with Ltext, we use 106 sentences down-sampled from the Wikipedia English dataset for unsupervised sentence embedding learning (Eq. 2). For supervised sentence embedding learning (Eq. 3), we (and Sim CSE) use a combined NLI dataset with 314K sentences with paired examples labeled as entailment, neutral, and non-entailment. For learning with Limage, both unsupervised and supervised sentence embedding settings use a downsampled Image Net dataset Simage.
Dataset Splits Yes The models are selected based on the validation set of the STS-Benchmark.
Hardware Specification Yes All the unsupervised base LMs are trained on 24GB Nvidia RTX-6000 GPUs, while supervised and large models are trained on 48GB Nvidia RTX-A6000 GPUs.
Software Dependencies Yes We use pytorch-1.10 with CUDA 11.3, torchvision-0.11.3, torchaudio-0.10.2, and Huggingface transformers-4.5.0 for our implementation.
Experiment Setup Yes Following Sim CSE [13], we train unsupervised models with Adam W for one epoch, and supervised models for 3 epochs. We then search batch sizes and learning rates from {64, 128, 256} and {1e 5, 2e 5, 3e 5} for Ltext. Moreover, we use a fixed batch size of 48 for Limage (and Laudio) and search learning rates among {5e 6, 2e 61e 6, 5e 7, 2e 7, 1e 7}.