Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Column-Oriented Datalog on the GPU
Authors: Yihao Sun, Sidharth Kumar, Thomas Gilray, Kristopher Micinski
AAAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a thorough evaluation. Our results show over 200 speedup compared to CPU-based column-oriented systems and 2.5 faster performance than other GPU prototypes in both standard Datalog and knowledge graph reasoning workloads. |
| Researcher Affiliation | Academia | Yihao Sun1, Sidharth Kumar2, Thomas Gilray3, Kristopher Micinski1 1Syracuse University 2University of Illinois at Chicago 3Washington State University EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: Binary Join on hash indexed DSM relations Algorithm 2: Deduplication in FVLOG for a 2-arity relation |
| Open Source Code | No | The paper does not contain an explicit statement about the release of source code for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | The first column lists the names of the graphs used in this experiment, all of which come from the Sparse Suite (Davis and Hu 2011) dataset. We selected tuple-generating dependency (TGD) queries (excluding existential rules) on the LUBM dataset from the Chase Bench (Benedikt et al. 2017) |
| Dataset Splits | No | The paper mentions various datasets used for evaluation but does not provide specific details on how these datasets were split into training, validation, or test sets for reproducibility. |
| Hardware Specification | Yes | All our experiments were conducted on a server equipped with an AMD EPYC 9534 and an NVIDIA H100. The AMD EPYC 9534 features 64 cores and 128 threads, supported by 500 GB of memory with a memory bandwidth of 0.43 TB/s. The NVIDIA H100 GPU includes 16,896 CUDA cores and 80 GB of HBM3 memory, offering up to 3.3 TB/s of memory bandwidth. |
| Software Dependencies | Yes | For VLog, we used Rulewerk, a Java wrapper... For Nemo, we utilized version 0.5.1. We used Souffl e version 2.4.1, with maximum multithreading and compiler optimizations. The RDFox version used was 7.1a... All GPU tools were compiled with NVC++ in NVHPC 24.1. |
| Experiment Setup | Yes | We used Souffl e version 2.4.1, with maximum multithreading and compiler optimizations. The RDFox version used was 7.1a, with all CPU threads enabled. |