reproducibilityindex.ai

When is an Embedding Model More Promising than Another?

Authors: Maxime Darrin, Philippe Formont, Ismail Ayed, Jackie CK Cheung, Pablo Piantanida

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate experimentally that our approach aligns closely with the capability of embedding models to facilitate various downstream tasks in both natural language processing and molecular biology. This effectively offers practitioners a valuable tool for prioritizing model trials.
Researcher Affiliation	Academia	Maxime DARRIN1,2,3,4 Philippe FORMONT1,2,4,5 Ismail BEN AYED1,5 Jackie Chi Kit CHEUNG2,3 Pablo PIANTANIDA1,2,4,6 1International Laboratory on Learning Systems, 2Mila Quebec AI Institute, 3Mc Gill University 4Université Paris-Saclay, 5ÉTS Montréal, 6CNRS, Centrale Supélec
Pseudocode	Yes	Procedure 1 Estimation of IS(U Z), GMµ,Σ,w denotes the Gaussian Mixture model with means µ, covariances Σ and weights w.
Open Source Code	Yes	The code used to perform all experiments is available at https://github.com/ills-montreal/emir
Open Datasets	Yes	We used them to extract embeddings for many different datasets from the MTEB benchmark such as Banking77 [19], Sickr [122], Amazon polarity [72], SNLI [120] and IMDB [70].
Dataset Splits	Yes	Datasets collected are split into a training, validation, and test set, following the scaffold-split strategy, further described in see Sec. D.3.
Hardware Specification	Yes	All our experiments were conducted on NVIDIA V100 and NVIDIA A6000 GPUs.
Software Dependencies	No	The paper mentions "ADAM [56]" as an optimizer and "RD-Kit and Datamol tool-kits[61, 71]" but does not specify version numbers for these or other key software dependencies required for reproducibility.
Experiment Setup	Yes	All the downstream tasks are trained in the exact same way. We use a dense classifier with two hidden layers of dimension 256 and train for two epochs using ADAM [56] with a learning rate of 10 3, on the official training set and evaluated on either the validation or test set when they are available (with respect to the Huggingface datasets).