Going beyond persistent homology using persistent homology

Authors: Johanna Immonen, Amauri Souza, Vikas Garg

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Theoretical contributions of this work... (Figure 1)... In this section, we compare Re PHINE to standard persistence diagrams from an empirical perspective. Our main goal is to evaluate whether our method enables powerful graph-level representation, confirming our theoretical analysis. Therefore, we conduct two main experiments. The first one leverages an artificially created dataset, expected to impose challenges to persistent homology and MP-GNNs. The second experiment aims to assess the predictive performance of Re PHINE in combination with GNNs on popular benchmarks for graph classification.
Researcher Affiliation Collaboration Johanna Immonen University of Helsinki johanna.x.immonen@helsinki.fi Amauri H. Souza Aalto University Federal Institute of CearĂ¡ amauri.souza@aalto.fi Vikas Garg Aalto University Yai Yai Ltd vgarg@csail.mit.edu
Pseudocode Yes Algorithm 1 Computing persistence diagrams; Algorithm 2 Re PHINE
Open Source Code Yes All methods were implemented in Py Torch [31], and our code is available at https://github.com/Aalto-Qu ML/Re PHINE.
Open Datasets Yes To assess the performance of Re PHINE on real data, we use six popular datasets for graph classification (details in the Supplementary): PROTEINS, IMDB-BINARY, NCI1, NCI109, MOLHIV and ZINC [7, 16, 20]... The ZINC and MOLHIV datasets have public splits. All models are initialized with a learning rate of 10^3 that is halved if the validation loss does not improve over 10 epochs.
Dataset Splits Yes For the TUDatasets, we obtain a random 80%/10%/10% (train/val/test) split, which is kept identical across five runs. The ZINC and MOLHIV datasets have public splits.
Hardware Specification Yes For all experiments, we use Tesla V100 GPU cards and consider a memory budget of 32GB of RAM.
Software Dependencies No The paper mentions implementation in 'Py Torch [31]' but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup Yes Regarding the training, all models follow the same setting: we apply the Adam optimizer [21] for 2000 epochs with an initial learning rate of 10 4 that is decreased by half every 400 epochs. We use batches of sizes 5, 8, 32 for the cubic08, cubic10, and cubic12 datasets, respectively... We carry out grid-search for model selection. More specifically, we consider a grid comprised of a combination of {2, 3} GNN layers and {2, 4, 8} filtration functions. We set the number of hidden units in the Deep Set and GNN layers to 64, and of the filtration functions to 16... The GNN node embeddings are combined using a global mean pooling layer. Importantly, for all datasets, we use the same architecture for Re PHINE and color-based persistence diagrams.