Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Invariance and identifiability issues for word embeddings
Authors: Rachel Carrington, Karthik Bharath, Simon Preston
NeurIPS 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide a formal treatment of the above identifiability issue, present some numerical examples, and discuss possible resolutions. |
| Researcher Affiliation | Academia | Rachel Carrington Karthik Bharath Simon Preston School of Mathematical Sciences, University of Nottingham EMAIL |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described in this paper. It references existing models/corpora like GloVe and word2vec, but not its own implementation code. |
| Open Datasets | Yes | The embedding is from model (3), with X taken to be a document term matrix computed from the Corpus of Historical American English [Davies, 2012]... V is a Glo Ve embedding1 with d = 300 trained on Wikipedia 2014 + Gigaword 5 corpus... word2vec embeddings trained on the 100-billion word Google News corpus |
| Dataset Splits | No | The paper mentions using specific test sets but does not provide details on training, validation, or test splits (e.g., percentages or sample counts) for the datasets it uses. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions "R s optim implementation of the Nelder Mead method" but does not specify version numbers for R or the `optim` package. |
| Experiment Setup | No | The paper describes using the Nelder Mead method for optimization and specifies embedding dimensions (e.g., d=300). However, it does not provide concrete hyperparameters (e.g., learning rate, batch size, number of epochs) or specific system-level training settings for its experiments. |