Deep Extrapolation for Attribute-Enhanced Generation
Authors: Alvin Chan, Ali Madani, Ben Krause, Nikhil Naik
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate GENhance in two data domains. First, we use the Stanford Sentiment Treebank (SST), a natural language benchmark containing movie reviews with five discrete sentiment attributes (Socher et al., 2013), to show that GENhance generates strongly positive reviews, after training with no positive examples. Second, we develop a protein stability dataset for the ACE2 protein (Chan et al., 2020) with a change in free energy (dd G) continuous attribute, and show that GENhance can generate protein sequences with higher stability than the training set (Fig. 1-(right)). GENhance significantly outperforms baseline methods based on (i) a generator-discriminator model with rejection sampling and (ii) an algorithm using Metropolis-Hastings Markov chain Monte Carlo sampling with a trained discriminator. GENhance s performance is further improved when provided access to a few examples with attribute scores beyond the training distribution. |
| Researcher Affiliation | Collaboration | Alvin Chan* Salesforce Research, NTU; Ali Madani* Salesforce Research; Ben Krause Salesforce Research; Nikhil Naik Salesforce Research |
| Pseudocode | No | The paper describes the methods in prose and equations but does not include formal pseudocode or algorithm blocks. |
| Open Source Code | Yes | We release our benchmark tasks and models to contribute to the study of generative modeling extrapolation and data-driven design in biology and chemistry: https://github.com/salesforce/genhance. |
| Open Datasets | Yes | First, we use the Stanford Sentiment Treebank (SST), a natural language benchmark containing movie reviews with five discrete sentiment attributes (Socher et al., 2013)... Second, we develop a protein stability dataset for the ACE2 protein (Chan et al., 2020)... |
| Dataset Splits | Yes | The discriminator model for both Gen-Disc and MCMC models are trained by finetuning the pretrained encoder on the full set of 250K sequences, with a random 10% used as the validation set while the generator modules of both Gen-Disc and GENhance are trained by finetuning the whole pretrained encoder-decoder model. |
| Hardware Specification | Yes | Further details on training settings on four NVIDIA A100 GPUs are found in the Supplement A.1 to A.3. |
| Software Dependencies | No | The paper mentions using T5-base, BERTlarge, GPT-2, and Fold X but does not specify their version numbers or other software dependencies with version numbers. |
| Experiment Setup | Yes | This oracle model is trained with a batch size of 32 for 30 epochs and achieves an accuracy of 92.5% for strong-positive vs neutral/negative classification. z|| perturbations of magnitude equal to 5% of the standard deviation of the training samples z|| are used for all GENhance generations. |