Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Gaussian Embedding of Linked Documents from a Pretrained Semantic Space
Authors: Antoine Gourru, Julien Velcin, Julien Jacques
IJCAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that our representations outperform or match most of the recent methods in classification and link prediction on three datasets (two citation networks and a corpus of news articles) in Section 4. |
| Researcher Affiliation | Academia | Antoine Gourru1 , Julien Velcin1 and Julien Jacques1 1Universit e de Lyon, Lyon 2, ERIC UR3083 EMAIL |
| Pseudocode | Yes | Algorithm 1 GELD Algorithm Input: D, U Parameters: η, λ, k Output: µ, σ2 |
| Open Source Code | Yes | We provide the implementation of GELD and the evaluation datasets to the community (https://github.com/Antoine Gourru/DNEmbedding). |
| Open Datasets | Yes | Cora [Tu et al., 2017] and Dblp [Tang et al., 2008; Pan et al., 2016] are two citation networks. Additionally, we use the Nyt dataset from [Gourru et al., 2020] containing press articles from January 2007. |
| Dataset Splits | Yes | Cora Dblp Nyt Train/Test ratio 10% 50% 10% 50% 10% 50% |
| Hardware Specification | Yes | We run all the experiments in parallel with 20 physical cores (Intel R Xeon R CPU E5-2640 v4 @ 2.40GHz) and 96GB of RAM. |
| Software Dependencies | No | The paper mentions 'scikit-learn package' and 'gensim' but does not specify their version numbers. It only states 'implemented in gensim'. |
| Experiment Setup | Yes | Similarly, we report the optimal parameters for GELD obtained via grid-search on the classification task: δ = 0.1, γ = 0.2, η = 0.99 for Cora, η = 0.8 for Dblp and η = 0.95 for Nyt. To learn word vectors, we adopt Skip-gram with negative sampling [Mikolov et al., 2013] implemented in gensim3. We use window size of 15 for Cora, 10 for Nyt, 5 for DBLP (depending on documents size), and 5 negative examples for both. |