Efficient Correlated Subgraph Searches for AI-powered Drug Discovery

Authors: Hiroaki Shiokawa, Yuma Naoi, Shohei Matsugu

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental analysis confirms that Corgi has a shorter running time and improved accuracy compared to existing state-of-the-art methods, while a case study demonstrates that Corgi is suitable for practical AI-powered drug discovery.
Researcher Affiliation Academia Hiroaki Shiokawa1 , Yuma Naoi2 and Shohei Matsugu2 1Center for Computational Sciences, University of Tsukuba, Japan 2Graduate School of Science and Technology, University of Tsukuba, Japan
Pseudocode Yes Algorithm 1 (Phase 1) View generation; Algorithm 2 (Phase 2) Mv Tk search
Open Source Code No The paper states 'All methods were implemented in C/C++ using the -O3 option.' but does not provide any links or explicit statements about the public release of their source code for the methodology.
Open Datasets Yes We tested 12 public molecule databases published by NCI [Nicklaus et al., 2012], DUD-E [Mysinger et al., 2012], LIT-PCBA [Nguyen et al., 2020], and ZINC 20 [Irwin et al., 2012]. Table 2 shows their statistics, where n, n , and d denote the average graph size, the average summarized graph view size, and the average degree, respectively. For more details, please refer Appendix A.
Dataset Splits No The paper describes a correlated subgraph search problem and does not utilize traditional machine learning dataset splits (e.g., training, validation, test sets) for model training or evaluation. The 'validation step' mentioned in Section 3.3 refers to a step within the algorithm to filter results, not a dataset split.
Hardware Specification Yes Evaluations were conducted on a server with an Intel Xeon CPU 2.90 GHz and 1 Ti B RAM.
Software Dependencies No The paper states 'All methods were implemented in C/C++ using the -O3 option.' However, it does not specify any particular software libraries, frameworks, or their version numbers that were used in the implementation.
Experiment Setup Yes We employed Top Cor [Ke et al., 2009] for the CSS method invoked in Algorithm 2 and set ϵ = 0.05 and T to the smallest value derived by Lemma 5. For each database, p is set to the largest possible value. ... Consistent with [Ke et al., 2009; Prateek et al., 2020], ten queries are generated for each database by randomly selecting subgraphs from the database. The results are averaged over the above ten queries.