Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Model Comparison for Semantic Grouping
Authors: Francisco Vargas, Kamen Brestnichki, Nils Hammerla
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve competitive results by applying the proposed framework with an appropriate choice of likelihood on the STS datasets. 5. Experiments We assess our methods performance on the Semantic Textual Similarity (STS) datasets |
| Researcher Affiliation | Industry | Francisco Vargas 1 Kamen Brestnichki 1 Nils Hammerla 1 1Babylon Health. Correspondence to: Francisco Vargas <EMAIL>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code builds on top of Sent Eval (Conneau & Kiela, 2018) and is available at https://github.com/Babylonpartners/MCSG. |
| Open Datasets | Yes | We assess our methods performance on the Semantic Textual Similarity (STS) datasets (Agirre et al., 2012; 2013; 2014; 2015; 2016). |
| Dataset Splits | No | The paper mentions using the STS datasets (Agirre et al., 2012; 2013; 2014; 2015; 2016) for experiments but does not explicitly provide specific train/validation/test splits, percentages, or sample counts. |
| Hardware Specification | No | The paper does not provide specific hardware details (like CPU/GPU models, memory, or cloud instance types) used for running experiments. |
| Software Dependencies | No | The paper mentions 'Sent Eval (Conneau & Kiela, 2018)' as a software dependency but does not provide a specific version number for it or any other software component. |
| Experiment Setup | No | The paper discusses model choices and some data preprocessing steps (padding sentences), but it does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, number of epochs, optimizer settings) needed for reproduction. |