Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Behavior Importance-Aware Graph Neural Architecture Search for Cross-Domain Recommendation

Authors: Chendi Ge, Xin Wang, Ziwei Zhang, Yijian Qin, Hong Chen, Haiyang Wu, Yang Zhang, Yuekui Yang, Wenwu Zhu

AAAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on benchmark CDR datasets and a large-scale industry advertising dataset demonstrate that Bi GNAS consistently outperforms state-of-the-art baselines.
Researcher Affiliation	Collaboration	1Department of Computer Science and Technology, Tsinghua University 2Beijing National Research Center for Information Science and Technology, Tsinghua University 3Machine Learning Platform Department, Tencent TEG
Pseudocode	No	The paper describes the proposed method using mathematical formulations, equations, and diagrams (Figure 1), but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	Code https://github.com/gcd19/Bi GNAS
Open Datasets	Yes	In this paper, we use the Amazon Product 5-core dataset (Mc Auley et al. 2015; He and Mc Auley 2016) for dual-domain recommendation due to its broad user interactions across diverse product categories, which makes it a standard choice for cross-domain recommendation (CDR) research.
Dataset Splits	Yes	Due to limited user overlap across multiple domains, we focus on dual-domain recommendation tasks, using the same domain pairs and splits as in BIAO (Chen et al. 2023a). After the inner model converges with the current perceptron parameters, the perceptron is updated. Repeating this process throughout training ensures the convergence of both models, allowing Bi GNAS to jointly optimize the recommendation architecture and behavior data importance. We select AUC as the main metric and report the test results of the model with the highest AUC score on the validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only mentions the implementation framework.
Software Dependencies	No	All methods are implemented in Py Torch and Py Torch Geometric (Fey and Lenssen 2019). The paper mentions the software frameworks but does not specify their version numbers.
Experiment Setup	Yes	The supernetwork consists of two layers, with a GNN architecture search space specifically designed for recommendation tasks, including GCN (Kipf and Welling 2017), GAT (Velickovic et al. 2018), Graph SAGE (Hamilton, Ying, and Leskovec 2017), Light GCN (He et al. 2020), and a Linear layer. During training, we employ an early stopping strategy with a maximum of 100 epochs, and a 10-epoch warm-up before bi-level optimization, using Adam as the optimizer. Each method is run with five different random seeds, with performance metrics reported as the average of these runs.