Discrete Object Generation with Reversible Inductive Construction
Authors: Ari Seff, Wenda Zhou, Farhan Damani, Abigail Doyle, Ryan P. Adams
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed approach on two highly structured discrete domains, molecules and Laman graphs, and find that it compares favorably to alternative methods at capturing distributional statistics for a host of semantically relevant metrics. Quantitative evaluation indicates that the proposed method can effectively model highly structured discrete distributions while adhering to strict validity constraints. |
| Researcher Affiliation | Academia | Ari Seff Princeton University Princeton, NJ aseff@princeton.edu Wenda Zhou Columbia University New York, NY wz2335@columbia.edu Farhan Damani Princeton University Princeton, NJ fdamani@princeton.edu Abigail Doyle Princeton University Princeton, NJ agdoyle@princeton.edu Ryan P. Adams Princeton University Princeton, NJ rpa@princeton.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks for its own method. |
| Open Source Code | Yes | We formulate our approach, generative reversible inductive construction (Gen RIC)1, as the equilibrium distribution of a Markov chain that only visits valid objects, without a need for inefficient rejection sampling. 1https://github.com/Princeton LIPS/reversible-inductive-construction |
| Open Datasets | Yes | For molecules we test the proposed approach on the ZINC dataset, which contains about 250K drug-like molecules from the ZINC database [35]. For Laman graphs, we generate synthetic graphs randomly via Algorithm 7 from Moussaoui [29], originally proposed for evaluating geometric constraint solvers embedded within CAD programs. |
| Dataset Splits | Yes | The model is trained on 220K molecules according to the same train/test split as in Jin et al. [19], Kusner et al. [21]. |
| Hardware Specification | No | We acknowledge computing resources from Columbia University s Shared Research Computing Facility project, which is supported by NIH Research Facility Improvement Grant 1G20RR030893-01, and associated funds from the New York State Empire State Development, Division of Science Technology and Innovation (NYSTAR) Contract C090171, both awarded April 15, 2010. This statement describes funding and a facility but lacks specific hardware details (e.g., GPU/CPU models). |
| Software Dependencies | No | The paper mentions software like 'RDKit [24]' but does not provide specific version numbers for software dependencies used in their own experiments. |
| Experiment Setup | No | Unless otherwise stated, the results reported in Sections 3 and 4, use a geometric distribution with five expected steps for the corruption sequence length. For each method, we obtain 20K samples either by running pre-trained models [19, 14, 21], by accessing pre-sampled sets [26, 34, 25], or by training models from scratch [33]2. While some details are given, comprehensive hyperparameter values (e.g., learning rate, batch size, specific optimizer settings) are not provided in the main text. |