Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable Feature Learning on Huge Knowledge Graphs for Downstream Machine Learning

Authors: Félix Lefebvre, Gael Varoquaux

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate SEPAL on 7 large-scale knowledge graphs and 46 downstream machine learning tasks. Our results show that SEPAL significantly outperforms previous methods on downstream tasks. In addition, SEPAL scales up its base embedding model, enabling fitting huge knowledge graphs on commodity hardware. Our code is available at: https://github.com/soda-inria/sepal. In this paper, we evaluate SEPAL s performance on knowledge graphs of increasing size between YAGO3 [2.6M entities, Mahdisoltani et al., 2014] and Wiki KG90Mv2 [91M entities, Hu et al., 2020]; we study the use of the embeddings for feature enrichment on 46 downstream machine learning tasks, showing that SEPAL makes embedding methods more tractable while generating better embeddings for downstream tasks.
Researcher Affiliation	Academia	Félix Lefebvre SODA Team, Inria Saclay EMAIL Gaël Varoquaux SODA Team, Inria Saclay Probabl
Pseudocode	Yes	Algorithm 1 BLOCS Input: Graph G = (V, E) with nodes V and edges E, hyperparameters h and m Output: List of overlapping connected subgraphs S list of subgraphs U V set of unassigned nodes
Open Source Code	Yes	Our code is available at: https://github.com/soda-inria/sepal.
Open Datasets	Yes	Knowledge graph datasets To compare large knowledge graphs of different sizes, we use Freebase [Bollacker et al., 2008], Wiki KG90Mv2 [(an extract of Wikidata) Hu et al., 2020], and three generations of YAGO: YAGO3 [Mahdisoltani et al., 2014], YAGO4 [Pellissier Tanon et al., 2020], and YAGO4.5 [Suchanek et al., 2024].
Dataset Splits	Yes	We randomly split each dataset into training (90%), validation (5%), and test (5%) subsets of triples. During stratification, we ensure that the train graph remains connected by moving as few triples as required from the validation/test sets to the training set.
Hardware Specification	Yes	Dist Mult, DGL-KE, Node Piece, and SEPAL were trained on Nvidia V100 GPUs with 32 GB of memory, and 20 CPU nodes with 252 GB of RAM.
Software Dependencies	Yes	We use the Py KEEN [Ali et al., 2021b] implementation for Dist Mult and Node Piece, and the implementations provided by the authors for the others.
Experiment Setup	Yes	Validation/test split and hyperparameter tuning We use 4 of the 42 Wiki DBs tables as validation data 2 for regression and 2 for classification tasks (see Figure 4). The remaining 38 Wiki DBs tables, along with the 4 real-world tables, are used exclusively for testing. ... Optimizer for core training: we use the Adam optimizer with learning rate lr = 1 10 3; Number p of negative samples per positive for core training: we use p = 100 (Table 9).