reproducibilityindex.ai

Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers

Authors: Krzysztof M Choromanski, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Han Lin, Kumar Avinava Dubey, Tamas Sarlos, Snigdha Chaturvedi

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators (FTFIs) are presented, including (a) approximation of graph metrics with tree metrics, (b) graph classification, (c) modeling on meshes, and finally (d) Topological Transformers (TTs) [Choromanski et al., 2022] for images. For Topological Transformers, we propose new relative position encoding (RPE) masking mechanisms with as few as three extra learnable parameters per Transformer layer, leading to 1.0-1.5%+ accuracy gains. Importantly, most of FTFIs are exact methods, thus numerically equivalent to their brute-force counterparts. When applied to graphs with thousands of nodes, those exact algorithms provide 5.7-13x speedups. We also provide an extensive theoretical analysis of our methods.
Researcher Affiliation	Collaboration	Krzysztof Choromanski1, 2 , Arijit Sehanobish3, , Somnath Basu Roy Chowdhury4, , Han Lin4, , Avinava Dubey5, , Tamas Sarlos5, Snigdha Chaturvedi4 1 Google Deep Mind, 2 Columbia University, 3 Independent, 4 UNC Chapel Hill, 5 Google Research.
Pseudocode	Yes	Algorithm 1 General Efficient Low-Rank Masked Attention from Choromanski et al. [2022]
Open Source Code	Yes	Our code is available at https://github.com/brcsomnath/Fast Tree Integrator. Specifically, we provide there the code for: (1) our algorithm leveraging Integrator Tree data structure (depicted in Fig 1), (2) adaptation to the Gromov-Wasserstein-type computation, (3) graph classification and (4) experiments on interpolation on meshes.
Open Datasets	Yes	We run experiments on Image Net and Places365 datasets using Vi T-B/16 (see Table 1). ...We also perform similar experiments for graph classification on the CUBES dataset Hanocka et al. [2019]. Specifically, we investigate how the polynomial degree affects the graph classification performance in Fig. 9 (left). Moreover, we benchmark FTFI on Model Net10 [Wu ets al., 2015], a dataset for 3D Point Cloud (PC) classification. ...Applying FTFI with a topological masking mechanism to the Vi Vi T architecture (factorized Transformer model variant, trained from scratch, as described in Arnab et al. [2021]) results in a 0.8% absolute improvement on the Kinetics dataset ([Kay et al., 2017]).
Dataset Splits	Yes	We conduct graph classification experiments on a wide range of benchmark datasets. We report the dataset statistics for the graph classification datasets in Table 2. More details about the datasets are available in Morris et al. [2020]. To evaluate the performance of the different kernels, we employ the framework proposed by [Errica et al., 2020]. In particular, 10-fold cross-validation is used to obtain an estimate of the generalization performance of our method and the baseline method. We repeat this cross validation experiment 5 times to get a robust estimation and report the standard deviation for each setup.
Hardware Specification	Yes	All experiments were run on a computer with an i9-12900k CPU and 64GB memory. ...Batch Size 4096 Compute resources 8 8 TPUv3
Software Dependencies	No	The paper mentions using Python for the code and libraries like POT library for baseline experiments, but does not specify exact version numbers for these software dependencies.
Experiment Setup	Yes	Table 6: Hyperparameters for Topological Transformer experiments Parameter Value Activation layer gelu Dropout prob 0.1 Attention dropout prob 0.1 Optimizer Adam Learning rate 10 3 Batch Size 4096 Compute resources 8 8 TPUv3 Number of Epochs 300 Warmup 10K weight decay 0.1 learning schedule cosine decay