Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning distributed representations with efficient SoftMax normalization

Authors: Lorenzo Dall'Amico, Enrico Maria Belliardo

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show on some pre-trained embedding datasets that the proposed estimation method achieves higher or comparable accuracy with competing methods. From this result, we design an efficient and task-agnostic algorithm that learns the embeddings by optimizing the cross entropy between the softmax and a set of probability distributions given as inputs. The proposed algorithm is interpretable and easily adapted to arbitrary embedding problems. We consider a few use cases and observe similar or higher performances and a lower computational time than similar 2Vec algorithms. (Abstract)
Researcher Affiliation Academia Lorenzo Dall Amico EMAIL ISI Foundation; Enrico Maria Belliardo EMAIL ISI Foundation
Pseudocode Yes Algorithm 1 EDRep Input: P Rn x probability matrix encoding similarities; d, embedding dimension; ℓ {1, . . . , κ}n node label vector; η0, learning rate; n_epochs, number of training epochs Output: Xn d, embedding matrix
Open Source Code Yes A Python implementation of our algorithm is available at github.com/lorenzodallamico/EDRep.
Open Datasets Yes We consider 6 datasets taken from the NLPL word embeddings repository2 (Kutuzov et al., 2017), representing word embeddings obtained with different algorithms and trained on different corpora: ... The datasets can be found at http://vectors.nlpl.eu/repository/ and are shared under the CC BY 4.0 license. (Section 2.2) ... The data are shared under the Creative Commons Public Domain Dedication license and can be downloaded at http://www.sociopatterns.org/datasets/sfhh-conference-data-set/. (Section D.2)
Dataset Splits Yes We then train a logistic regression classifier on the embedding cosine similarities with the 70% of the labeled data and test it on the remaining 30% of the data.
Hardware Specification Yes All codes are run on a Dell Inspiron laptop with 16 GB of RAM and with a processor 11th Gen Intel Core i7-11390H @ 3.40GHz 8.
Software Dependencies No The paper mentions "A Python implementation of our algorithm is available at github.com/lorenzodallamico/EDRep." but does not provide specific version numbers for Python or any libraries/frameworks used.
Experiment Setup Yes The embedding algorithms are run with the same initial condition and parameters: η0 = 0.7, d = 32, nepochs = 25 (for the first two plots). (Section 3.4) The green circles refer to the EDRep algorithm with d = 32, κ = 1 and w = 3. (Section 4.1) Both embeddings have dimension d = 200. (Section 4.2) We run the SIR model letting all nodes to be in the S state at the beginning of the simulation and having one infected node. The experiment is run with β = 0.15 and µ = 0.01. (Section 4.3)