Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

RetroMoE: A Mixture-of-Experts Latent Translation Framework for Single-step Retrosynthesis

Authors: Xinjie Li, Abhinav Verma

IJCAI 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the USPTO-50K and USPTO-MIT datasets demonstrate the superiority of our method, which not only surpasses most semi-template-based and template-free methods but also delivers competitive results against template-based methods.
Researcher Affiliation Academia Xinjie Li , Abhinav Verma Pennsylvania State University, USA EMAIL
Pseudocode No The paper describes the methodology using text, mathematical equations, and architectural diagrams (Figures 2 and 3), but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code, nor does it include links to a code repository.
Open Datasets Yes In our study, we utilize two established retrosynthesis benchmark datasets: USPTO-50k [Schneider et al., 2016] and USPTO-MIT [Jin et al., 2017].
Dataset Splits Yes The USPTO-50k dataset ... is divided into training, validation, and test sets with 40,008, 5,001, and 5,007 reactions, respectively... The USPTO-MIT dataset contains approximately 479,000 atom-mapped reactions, with around 409,000 for training, 40,000 for validation, and 30,000 for testing.
Hardware Specification Yes We conduct the experiments using the Py Torch framework on NVIDIA A5000 GPUs.
Software Dependencies No The paper mentions using the 'Py Torch framework' but does not specify a version number for PyTorch or any other software dependency.
Experiment Setup Yes In our experiments on the USPTO-50K dataset, we use the Adam optimizer with an initial learning rate of 1.25e-4 for the first training stage and 1e-4 for the second stage. The training lasts for 45 epochs in the first stage and 170 epochs in the second. We apply an exponential scheduler for learning rate decay and set the β value in Loss1 to 0.001. Both the graph encoder and transformer decoder in each stage have a hidden size of 512, 8 layers for both encoder and decoder, and 8 attention heads. The Mo E network consists of 3 gating layers, 8 expert layers, and 3 experts. For the experiments on the USPTO-MIT dataset, we use the Adam optimizer with an initial learning rate of 1e-4 for the first phase and 5e-5 for the subsequent phase. The training duration is 85 epochs for the first stage and 300 epochs for the second. We continue using an exponential scheduler for learning rate decay, with the β value in Loss1 set to 0.001. In terms of model configuration, both stages feature a graph encoder and transformer decoder with a hidden size of 768, 8 encoder and decoder layers, and 12 attention heads. The Mo E network includes 3 gating layers, 8 expert layers, and 3 experts.