reproducibilityindex.ai

TorsionNet: A Reinforcement Learning Approach to Sequential Conformer Search

Authors: Tarun Gogineni, Ziping Xu, Exequiel Punzalan, Runxuan Jiang, Joshua Kammeraad, Ambuj Tewari, Paul Zimmerman

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show that Torsion Net outperforms the highest scoring chemoinformatics method by 4x on large branched alkanes, and by several orders of magnitude on the previously unexplored biopolymer lignin, with applications in renewable energy.
Researcher Affiliation	Academia	1Department of EECS, University of Michigan 2Department of Statistics, University of Michigan 3Department of Chemistry, University of Michigan {tgog,zipingxu,epunzal,runxuanj,joshkamm,tewaria,paulzim}@umich.edu
Pseudocode	Yes	Our doubling curriculum trains on set Xt = X 1:2t 1 J , by randomly sampling a molecule x from Xt as the context on round t. The end of a round is marked by the achievement of desired performance. The design of doubling curriculum is to balance learning and forgetting as we always have a 1/2 probability to sample molecules in the earlier rounds (see Algorithm 1 in the appendix).
Open Source Code	Yes	Our code is available at https://github.com/tarungog/torsionnet_paper_version.
Open Datasets	No	The paper describes generating its own datasets for branched alkanes ('We created a script to randomly generate molecular graphs of branched alkanes') and lignin ('We adapted a method to generate instances of the biopolymer family of lignins [31]'), but does not provide concrete access information (e.g., a direct download link, DOI, or repository) for the specific dataset instances used in their experiments. While it references the method for lignin generation, it does not provide the generated data itself.
Dataset Splits	No	The paper states: 'The validation environment consists of a single 10 torsion alkane unseen at train time.' and 'The validation and test molecules are each unique 8-lignins.' While it identifies the molecules used for validation, it does not provide specific split percentages or sample counts for the overall dataset partitioning.
Hardware Specification	No	The paper mentions 'All methods are run on CPU at test time' but does not specify any particular CPU models, GPU models, memory, or other hardware specifications used for experiments (either training or testing).
Software Dependencies	No	The paper mentions using 'Open AI Gym framework', 'RDKit', 'classical force field MMFF94', and a 'modular deep RL framework [42]' (which implies PyTorch), but it does not provide specific version numbers for any of these software dependencies.
Experiment Setup	Yes	We train the model using a ﬁxed episodic length K, which is chosen on a per environment basis based on number of torsions of the target molecule(s). All molecules use sampling horizon K = 200. The Gibbs score reward for the lignin environment features high variance across several orders of magnitude, even at very high temperatures (τ = 2000K)... Initial conformers for alkane environments are sampled from RDKit, and the distance threshold m is set to 0.05 TFD. Initial conformers for lignin environments are sampled from Open Babel, and the distance threshold m is set to 0.15 TFD.