Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
CFD: Learning Generalized Molecular Representation via Concept-Enhanced Feedback Disentanglement
Authors: Aming Wu, Cheng Deng
ICLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct extensive experiments to answer the research questions. (1) Can our method CFD achieve better OOD generalization performance against baselines? (2) Does our method possess the ability to capture important substructures and improve the performance of molecular substructure prediction? (3) How does each component contribute to the final performance? |
| Researcher Affiliation | Academia | Aming Wu, Cheng Deng School of Electronic Engineering, Xidian University, Xi an, China EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1 Training Process of Concept-Enhanced Feedback Disentanglement |
| Open Source Code | Yes | The source code will be released at https://github.com/Aming Wu/Molecule CFD. |
| Open Datasets | Yes | For OOD molecular representation learning, we follow the settings of the work (Zhuang et al., 2023) and employ two real-world datasets, i.e., GOOD (Gui et al., 2022) that is a systematic benchmark tailored specifically for graph OOD problems, and Drug OOD (Ji et al., 2022) that is a OOD benchmark for AI-aided drug discovery. Besides, for molecule s ground-state prediction, we follow the settings of the work (Xu et al., 2024) and utilize Molecule3D (Xu et al., 2021b) and QM9 (Ramakrishnan et al., 2014) to evaluate the ability of our method for focusing on substructures. |
| Dataset Splits | Yes | Each dataset contains two environment-splitting strategies (scaffold and size), and two shift types (covariate and concept) are applied per splitting outcome, resulting in a total of 12 distinct datasets (as shown in Table 1). Furthermore, Drug OOD (Ji et al., 2022) provides three environment-splitting strategies, including assay, scaffold, and size, and applies these three splitting to two measurements (IC50 and EC50). As a result, we obtain 6 datasets (as shown in Table 2)... In this paper, we employ the random splitting according to the same distribution based on the molecule s core component. |
| Hardware Specification | No | No specific hardware details (GPU models, CPU models, etc.) are mentioned in the paper for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies or version numbers for libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | τ and λ are two hyper-parameters, which are separately set to 1.0 and 0.01 in the experiments. β is a hyper-parameter, which is set to 0.5 in the experiments. α1 and α2 are hyper-parameters, which are separately set to 0.1 and 0.01 in the experiments. The number of the learned concepts is set to 12. The number of feedback iterations is set to 8. |