Fast Tree-Field Integrators: From Low Displacement Rank to Topological Transformers
Authors: Krzysztof M Choromanski, Arijit Sehanobish, Somnath Basu Roy Chowdhury, Han Lin, Kumar Avinava Dubey, Tamas Sarlos, Snigdha Chaturvedi
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a new class of fast polylog-linear algorithms based on the theory of structured matrices (in particular low displacement rank) for integrating tensor fields defined on weighted trees. Several applications of the resulting fast tree-field integrators (FTFIs) are presented, including (a) approximation of graph metrics with tree metrics, (b) graph classification, (c) modeling on meshes, and finally (d) Topological Transformers (TTs) [Choromanski et al., 2022] for images. For Topological Transformers, we propose new relative position encoding (RPE) masking mechanisms with as few as three extra learnable parameters per Transformer layer, leading to 1.0-1.5%+ accuracy gains. Importantly, most of FTFIs are exact methods, thus numerically equivalent to their brute-force counterparts. When applied to graphs with thousands of nodes, those exact algorithms provide 5.7-13x speedups. We also provide an extensive theoretical analysis of our methods. |
| Researcher Affiliation | Collaboration | Krzysztof Choromanski1, 2 , Arijit Sehanobish3, , Somnath Basu Roy Chowdhury4, , Han Lin4, , Avinava Dubey5, , Tamas Sarlos5, Snigdha Chaturvedi4 1 Google Deep Mind, 2 Columbia University, 3 Independent, 4 UNC Chapel Hill, 5 Google Research. |
| Pseudocode | Yes | Algorithm 1 General Efficient Low-Rank Masked Attention from Choromanski et al. [2022] |
| Open Source Code | Yes | Our code is available at https://github.com/brcsomnath/Fast Tree Integrator. Specifically, we provide there the code for: (1) our algorithm leveraging Integrator Tree data structure (depicted in Fig 1), (2) adaptation to the Gromov-Wasserstein-type computation, (3) graph classification and (4) experiments on interpolation on meshes. |
| Open Datasets | Yes | We run experiments on Image Net and Places365 datasets using Vi T-B/16 (see Table 1). ...We also perform similar experiments for graph classification on the CUBES dataset Hanocka et al. [2019]. Specifically, we investigate how the polynomial degree affects the graph classification performance in Fig. 9 (left). Moreover, we benchmark FTFI on Model Net10 [Wu ets al., 2015], a dataset for 3D Point Cloud (PC) classification. ...Applying FTFI with a topological masking mechanism to the Vi Vi T architecture (factorized Transformer model variant, trained from scratch, as described in Arnab et al. [2021]) results in a 0.8% absolute improvement on the Kinetics dataset ([Kay et al., 2017]). |
| Dataset Splits | Yes | We conduct graph classification experiments on a wide range of benchmark datasets. We report the dataset statistics for the graph classification datasets in Table 2. More details about the datasets are available in Morris et al. [2020]. To evaluate the performance of the different kernels, we employ the framework proposed by [Errica et al., 2020]. In particular, 10-fold cross-validation is used to obtain an estimate of the generalization performance of our method and the baseline method. We repeat this cross validation experiment 5 times to get a robust estimation and report the standard deviation for each setup. |
| Hardware Specification | Yes | All experiments were run on a computer with an i9-12900k CPU and 64GB memory. ...Batch Size 4096 Compute resources 8 8 TPUv3 |
| Software Dependencies | No | The paper mentions using Python for the code and libraries like POT library for baseline experiments, but does not specify exact version numbers for these software dependencies. |
| Experiment Setup | Yes | Table 6: Hyperparameters for Topological Transformer experiments Parameter Value Activation layer gelu Dropout prob 0.1 Attention dropout prob 0.1 Optimizer Adam Learning rate 10 3 Batch Size 4096 Compute resources 8 8 TPUv3 Number of Epochs 300 Warmup 10K weight decay 0.1 learning schedule cosine decay |