HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs

Authors: Sunwoo Kim, Shinhwan Kang, Fanchen Bu, Soo Yong Lee, Jaemin Yoo, Kijung Shin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that HYPEBOY learns effective general-purpose hypergraph representations. It significantly outperforms 16 baseline methods across 11 benchmark datasets. Code and datasets are available at https://github.com/kswoo97/hypeboy. (Abstract) ... We assess the generalizability of learned representations from HYPEBOY in two downstream tasks: node classification and hyperedge prediction. ... As shown in Table 1, HYPEBOY shows the best average ranking among all 18 methods. (Section 5.1)
Researcher Affiliation Collaboration Sunwoo Kim Shinhwan Kang Fanchen Bu Soo Yong Lee Jaemin Yoo Kijung Shin Kim Jaechul Graduate School of AI, School of Electrical Engineering Korea Advanced Institute of Science and Technology (KAIST) {kswoo97, shinhwan.kang, boqvezen97, syleetolow, jaemin, kijungs}@kaist.ac.kr ... This work was supported by Samsung Electronics Co., Ltd. and Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (No. 2022-0-00871, Development of AI Autonomy and Knowledge Enhancement for AI Agent Collaboration) (No. 2019-0-00075, Artificial Intelligence Graduate School Program (KAIST)).
Pseudocode Yes Algorithm 1 Node swapping algorithm
Open Source Code Yes Code and datasets are available at https://github.com/kswoo97/hypeboy.
Open Datasets Yes For experiments, we use 11 benchmark hypergraph datasets. ... The Cora, Citeseer, Pubmed, Cora-CA, and DBLP-P datasets are from the work of Yadati et al. ... The DBLP-A and IMDB datasets are from the work of Wang et al. (2019). ... The AMiner dataset is from the work of Zhang et al. (2019). ... The Mondelnet-40 (MN-40) dataset is from the work of Wu et al. (2015). ... The 20Newsgroups (20News) dataset is from the work of Dua et al. (2017). ... The House dataset is from the work of Chien et al. (2022).
Dataset Splits Yes Following Wei et al. (2022), we randomly split the nodes into training/validation/test sets with the ratio of 1%/1%/98%, respectively. ... For hyperedge prediction, we split hyperedges into training/validation/test sets by the ratio of 60%/20%/20%.
Hardware Specification Yes All experiments are conducted on a machine with NVIDIA RTX 8000 D6 GPUs (48GB memory) and two Intel Xeon Silver 4214R processors.
Software Dependencies No The paper mentions several software components like "Adam optimizer", "Uni GCNII", "GCN", and activation functions like "ReLU", but it does not specify exact version numbers for these software dependencies or libraries.
Experiment Setup Yes We fix the hidden dimension and dropout rate of all models as 128 and 0.5, respectively. When training any neural network for downstream tasks, we train a model for 200 epochs, and for every 10 epochs, we evaluate the validation accuracy of the model. ... For a linear evaluation protocol of node classification, we utilize a logistic classifier, with a learning rate 0.001. ... For all the supervised models, we tune the learning rate as a hyperparameter within {0.05, 0.01, 0.005, 0.001, 0.0005, 0.0001}. ... For HYPEBOY, we tune feature augmentation magnitude px within {0.0, 0.1, 0.2, 0.3, 0.4} and hyperedge augmentaiton magnitude pe within {0.5, 0.6, 0.7, 0.8, 0.9}. We fix the learning rate and training epochs of the feature reconstruction warm-up as 0.001 and 300, respectively.