Gaussian Embedding of Linked Documents from a Pretrained Semantic Space

Authors: Antoine Gourru, Julien Velcin, Julien Jacques

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our representations outperform or match most of the recent methods in classification and link prediction on three datasets (two citation networks and a corpus of news articles) in Section 4.
Researcher Affiliation Academia Antoine Gourru1 , Julien Velcin1 and Julien Jacques1 1Universit e de Lyon, Lyon 2, ERIC UR3083 {antoine.gourru, julien.velcin, julien.jacques}@univ-lyon2.fr
Pseudocode Yes Algorithm 1 GELD Algorithm Input: D, U Parameters: η, λ, k Output: µ, σ2
Open Source Code Yes We provide the implementation of GELD and the evaluation datasets to the community (https://github.com/Antoine Gourru/DNEmbedding).
Open Datasets Yes Cora [Tu et al., 2017] and Dblp [Tang et al., 2008; Pan et al., 2016] are two citation networks. Additionally, we use the Nyt dataset from [Gourru et al., 2020] containing press articles from January 2007.
Dataset Splits Yes Cora Dblp Nyt Train/Test ratio 10% 50% 10% 50% 10% 50%
Hardware Specification Yes We run all the experiments in parallel with 20 physical cores (Intel R Xeon R CPU E5-2640 v4 @ 2.40GHz) and 96GB of RAM.
Software Dependencies No The paper mentions 'scikit-learn package' and 'gensim' but does not specify their version numbers. It only states 'implemented in gensim'.
Experiment Setup Yes Similarly, we report the optimal parameters for GELD obtained via grid-search on the classification task: δ = 0.1, γ = 0.2, η = 0.99 for Cora, η = 0.8 for Dblp and η = 0.95 for Nyt. To learn word vectors, we adopt Skip-gram with negative sampling [Mikolov et al., 2013] implemented in gensim3. We use window size of 15 for Cora, 10 for Nyt, 5 for DBLP (depending on documents size), and 5 negative examples for both.