Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Transformer-based Objective-reinforced Generative Adversarial Network to Generate Desired Molecules

Authors: Chen Li, Chikashige Yamanaka, Kazuma Kaitoh, Yoshihiro Yamanishi

IJCAI 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments were performed using the ZINC chemical dataset, and the results demonstrated the usefulness of Trans ORGAN in terms of uniqueness, novelty, and diversity of the generated molecules.
Researcher Affiliation Academia Chen Li , Chikashige Yamanaka , Kazuma Kaitoh and Yoshihiro Yamanishi Department of Bioscience and Bioinformatics, Faculty of Computer Science and Systems Engineering, Kyushu Institute of Technology, Iizuka, Japan EMAIL, EMAIL, EMAIL
Pseudocode Yes Algorithm 1: MC search under policy Gθ; Algorithm 2: Pre/training for Trans ORGAN
Open Source Code No The paper does not provide an explicit statement or a direct link to the open-source code for the methodology described.
Open Datasets Yes The test data were a subset of the ZINC chemical dataset [Ramakrishnan et al., 2014], which contains 134,000 molecules represented by SMILES strings.
Dataset Splits No The paper states it used a subset of the ZINC dataset and discusses pre-training and adversarial training, but does not provide specific train/validation/test splits (e.g., percentages or sample counts).
Hardware Specification No The paper mentions using Pytorch but does not provide specific hardware details such as GPU or CPU models used for experiments.
Software Dependencies Yes All experiments were performed by using Pytorch version 1.8.1.
Experiment Setup Yes We set the dimension of the word embedding to 16 and the dropout rate to 0.2. The encoder and decoder each had four heads and two stacked layers. The generator was pre-trained over 100 epochs by maximum likelihood estimation (MLE). The dimension of the word embedding was 32 for the discriminator. We set the number of kernels as 1, 3, 5, 7, and 9; the kernel size as 20, 30, 40, 50, and 60; and the dropout rate to 0.75. In the pre-training phase, the discriminator was pre-trained over ten epochs. In addition, we set the tradeoff between maintaining the likelihood and RL as λ = 0.5. The MC search time N was set to 16.