Auto-Differentiation of Relational Computations for Very Large Scale Machine Learning

Authors: Yuxin Tang, Zhimin Ding, Dimitrije Jankov, Binhang Yuan, Daniel Bourgeois, Chris Jermaine

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show experimentally that a relational engine running an auto-differentiated relational algorithm can easily scale to very large datasets, and is competitive with state-of-the-art, special-purpose systems for large-scale distributed machine learning.
Researcher Affiliation Academia 1Department of Computer Science, Rice University, Houston, US 2ETH Zurich, Switzerland.
Pseudocode Yes Algorithm 1 Chain Rule (vi, vj, Q Rj , R1, ..., Rk ) ... Algorithm 2 RAAuto Diff (Q, In1, In2, ... )
Open Source Code Yes We provide a simple python tool can be used for RA auto-differentiation: https://github. com/anonymous-repo-33/relation-algebra-autodiff
Open Datasets Yes This GCN is benchmarked using the datasets in Table 1. ogbn-arxiv (0.2M, 1.1M) ogbn-products (0.1M, 39M) ogbn-papers100M (0.1B, 1.6B) friendster (65.6M, 3.6B) ... We train our KGE model on the Freebase data set. Freebase (Chah, 2017) contains 1.9 billion triples in RDF format;
Dataset Splits Yes We split the dataset into a training set (90%), a validation set (5%), and a testing set (5%).
Hardware Specification Yes Experiments are run on AWS, using m5.4xlarge instances with 20 cores, 64GB DDR4 memory, and 1TB general SSD.
Software Dependencies Yes DGL is built from the latest version 0.9 from scratch.
Experiment Setup Yes The Adam optimizer is used with learning rate η = 0.1; the dropout rate γ = 0.5; the hidden layer dimension D = 256; batch size B = 1024.