Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DivGCL: A Graph Contrastive Learning Model for Diverse Recommendation

Authors: Wenwen Gong, Yangliao Geng, Dan Zhang, Yifan Zhu, Xiaolong Xu, Haolong Xiang, Amin Beheshti, Xuyun Zhang, Lianyong Qi

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four popular datasets demonstrate that Div GCL surpasses existing approaches in balancing accuracy and diversity, with an improvement of 23.47% at T@20 (abbreviation for trade-off metric) on ML-1M.
Researcher Affiliation Academia Wenwen Gong1, Yangliao Geng2, Dan Zhang3, Yifan Zhu4, Xiaolong Xu5, Haolong Xiang5, Amin Beheshti6, Xuyun Zhang6, Lianyong Qi7,8* 1School of Information and Electrical Engineering, China Agricultural University, China 2School of Computer Science & Technology, Beijing Jiaotong University, China 3Department of Computer Science and Technology, Tsinghua University, China 4School of Computer Science, Beijing University of Posts and Telecommunications, China 5School of Software, Nanjing University of Information Science and Technology, China 6School of Computing, Macquarie University, Australia 7College of Computer Science and Technology, China University of Petroleum (East China), China 8State Key Lab. for Novel Software Technology, Nanjing University, China EMAIL, EMAIL, EMAIL, yifan EMAIL, EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: Training Process of Div GCL
Open Source Code No The paper states: "for fair comparisons, we implement all experiments in the framework in Sim GCL (Yu et al. 2022)." However, it does not provide any concrete access information (link or explicit statement of release) for the source code of Div GCL itself.
Open Datasets Yes To verify the performance of Div GCL, we perform comprehensive experiments on four widely used public datasets with different domains, including Movie Lens1M (ML-1M), Beauty, AMiner, and Yelp2018.
Dataset Splits Yes As for the Yelp2018 dataset, we follow the split ratio of datasets in Sim GCL, i.e., the ratio 7:1:2 for the training set, validation set, and test set. For Beauty and ML-1M datasets, we do the same segmentation and only retain the users with interaction records greater than 10, where the main category is selected as the final category.
Hardware Specification Yes All our experimental comparisons are performed on an Ubuntu server with hardware settings (18.04.4 LTS server with Intel(R) Xeon(R) Gold 6240 16-Core Processor, and Ge Force RTX 3090 GPU) and software settings (Python 3.9.7 Py Torch).
Software Dependencies Yes All our experimental comparisons are performed on an Ubuntu server with hardware settings (18.04.4 LTS server with Intel(R) Xeon(R) Gold 6240 16-Core Processor, and Ge Force RTX 3090 GPU) and software settings (Python 3.9.7 Py Torch).
Experiment Setup Yes The training batch size on the ML-1M, Beauty, AMiner, and Yelp2018 datasets is set to 256, 64, 256, and 128. For all models, we employ Xavier initialization to initialize the parameters and Adam optimizer with a learning rate 1e 3 as the default optimizer. The backbone in Div GCL is a 2-layer Light GCN that propagates the user-item interactions. Early stopping is used if Recall@20 on the validation dataset does not increase for the next 10 epochs. Hyperparameter analysis also details embedding size d, perturbation strength Οƒ, weight Ξ±, and number of graph convolution layers l.