reproducibilityindex.ai

Deep Extrapolation for Attribute-Enhanced Generation

Authors: Alvin Chan, Ali Madani, Ben Krause, Nikhil Naik

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate GENhance in two data domains. First, we use the Stanford Sentiment Treebank (SST), a natural language benchmark containing movie reviews with ﬁve discrete sentiment attributes (Socher et al., 2013), to show that GENhance generates strongly positive reviews, after training with no positive examples. Second, we develop a protein stability dataset for the ACE2 protein (Chan et al., 2020) with a change in free energy (dd G) continuous attribute, and show that GENhance can generate protein sequences with higher stability than the training set (Fig. 1-(right)). GENhance signiﬁcantly outperforms baseline methods based on (i) a generator-discriminator model with rejection sampling and (ii) an algorithm using Metropolis-Hastings Markov chain Monte Carlo sampling with a trained discriminator. GENhance s performance is further improved when provided access to a few examples with attribute scores beyond the training distribution.
Researcher Affiliation	Collaboration	Alvin Chan* Salesforce Research, NTU; Ali Madani* Salesforce Research; Ben Krause Salesforce Research; Nikhil Naik Salesforce Research
Pseudocode	No	The paper describes the methods in prose and equations but does not include formal pseudocode or algorithm blocks.
Open Source Code	Yes	We release our benchmark tasks and models to contribute to the study of generative modeling extrapolation and data-driven design in biology and chemistry: https://github.com/salesforce/genhance.
Open Datasets	Yes	First, we use the Stanford Sentiment Treebank (SST), a natural language benchmark containing movie reviews with ﬁve discrete sentiment attributes (Socher et al., 2013)... Second, we develop a protein stability dataset for the ACE2 protein (Chan et al., 2020)...
Dataset Splits	Yes	The discriminator model for both Gen-Disc and MCMC models are trained by ﬁnetuning the pretrained encoder on the full set of 250K sequences, with a random 10% used as the validation set while the generator modules of both Gen-Disc and GENhance are trained by ﬁnetuning the whole pretrained encoder-decoder model.
Hardware Specification	Yes	Further details on training settings on four NVIDIA A100 GPUs are found in the Supplement A.1 to A.3.
Software Dependencies	No	The paper mentions using T5-base, BERTlarge, GPT-2, and Fold X but does not specify their version numbers or other software dependencies with version numbers.
Experiment Setup	Yes	This oracle model is trained with a batch size of 32 for 30 epochs and achieves an accuracy of 92.5% for strong-positive vs neutral/negative classiﬁcation. z\|\| perturbations of magnitude equal to 5% of the standard deviation of the training samples z\|\| are used for all GENhance generations.