TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search

Authors: Tarun Gogineni, Ziping Xu, Exequiel Punzalan, Runxuan Jiang, Joshua Kammeraad, Ambuj Tewari, Paul Zimmerman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that Torsion Net outperforms the highest scoring chemoinformatics method by 4x on large branched alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin, with applications in renewable energy.
Researcher Affiliation Academia 1Department of EECS, University of Michigan 2Department of Statistics, University of Michigan 3Department of Chemistry, University of Michigan {tgog,zipingxu,epunzal,runxuanj,joshkamm,tewaria,paulzim}@umich.edu
Pseudocode Yes Our doubling curriculum trains on set Xt = X 1:2t 1 J , by randomly sampling a molecule x from Xt as the context on round t. The end of a round is marked by the achievement of desired performance. The design of doubling curriculum is to balance learning and forgetting as we always have a 1/2 probability to sample molecules in the earlier rounds (see Algorithm 1 in the appendix).
Open Source Code Yes Our code is available at https://github.com/tarungog/torsionnet_paper_version.
Open Datasets No The paper describes generating its own datasets for branched alkanes ('We created a script to randomly generate molecular graphs of branched alkanes') and lignin ('We adapted a method to generate instances of the biopolymer family of lignins [31]'), but does not provide concrete access information (e.g., a direct download link, DOI, or repository) for the specific dataset instances used in their experiments. While it references the method for lignin generation, it does not provide the generated data itself.
Dataset Splits No The paper states: 'The validation environment consists of a single 10 torsion alkane unseen at train time.' and 'The validation and test molecules are each unique 8-lignins.' While it identifies the molecules used for validation, it does not provide specific split percentages or sample counts for the overall dataset partitioning.
Hardware Specification No The paper mentions 'All methods are run on CPU at test time' but does not specify any particular CPU models, GPU models, memory, or other hardware specifications used for experiments (either training or testing).
Software Dependencies No The paper mentions using 'Open AI Gym framework', 'RDKit', 'classical force field MMFF94', and a 'modular deep RL framework [42]' (which implies PyTorch), but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup Yes We train the model using a fixed episodic length K, which is chosen on a per environment basis based on number of torsions of the target molecule(s). All molecules use sampling horizon K = 200. The Gibbs score reward for the lignin environment features high variance across several orders of magnitude, even at very high temperatures (τ = 2000K)... Initial conformers for alkane environments are sampled from RDKit, and the distance threshold m is set to 0.05 TFD. Initial conformers for lignin environments are sampled from Open Babel, and the distance threshold m is set to 0.15 TFD.