Implicitly Guided Design with PropEn: Match your Data to Follow the Gradient

Authors: NataĊĦa Tagasovska, Vladimir Gligorijevic, Kyunghyun Cho, Andreas Loukas

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations in toy problems and scientific applications, such as therapeutic protein design and airfoil optimization, demonstrate Prop En s advantages over common baselines. Notably, the protein design results are validated with wet lab experiments, confirming the competitiveness and effectiveness of our approach.
Researcher Affiliation Collaboration Prescient/MLDD, Genentech Research and Early Development Department of Computer Science, Center for Data Science, New York University
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/prescient-design/propen.
Open Datasets No The paper mentions datasets like NACA airfoils and therapeutic antibody proteins, but it does not provide concrete access information (link, DOI, specific repository, or citation with author/year for public access) for these datasets. It also mentions synthetic toy datasets which are generated, not external.
Dataset Splits No The paper mentions "We randomly select 0.1% as holdout dataset for seeds, and use the rest for training." and discusses "wet lab validation" as an experimental outcome. However, it does not explicitly define a separate validation dataset split with percentages or counts for model tuning during training.
Hardware Specification No The paper does not provide specific details on the hardware used, such as GPU/CPU models, memory, or cloud instance types. It only generally refers to running experiments.
Software Dependencies No The paper does not provide specific version numbers for software dependencies (e.g., Python 3.x, PyTorch 1.x, specific solver versions). It mentions using "Neural Foil [41]" but without a specific version.
Experiment Setup Yes ablation studies: N {100, 200}, d {2, 10, 50, 100} matching thresholds: x = y = 1, number of epochs: 500, batch size: 64 ... ablation studies: N {200, 500, 1000}, matching thresholds: x = y {0.3, 0.5, 0.7, 1} number of epochs: 1000, batch size: 100 ... Hyper-parameter choice. For optimizing the parameters of the baselines in the toy and engineering experiments, we conducted a grid search over the learning rate ([1e-2, 1e-5]), weight decay ([1e-2, 1e-5]), number of epochs ([300, 1000, 5000]), batch size (32, 64, 128), and number of neurons per layer ([30, 50, 100]). Therapeutic Proteins batch size: 32, training epochs: 300.