Learning to Make Decisions via Submodular Regularization

Authors: Ayya Alieva, Aiden Aceves, Jialin Song, Stephen Mayo, Yisong Yue, Yuxin Chen

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the performance of our algorithm on a variety of batched and sequential optimization tasks, including set cover, active learning, and data-driven protein engineering.
Researcher Affiliation Academia Ayya Alieva Stanford University ayya@stanford.edu Aiden Aceves Caltech aaceves@caltech.edu Jialin Song Caltech jssong@caltech.edu Stephen Mayo Caltech steve@caltech.edu Yisong Yue Caltech yyue@caltech.edu Yuxin Chen University of Chicago chenyuxin@uchicago.edu
Pseudocode Yes Algorithm 1 Learning to make decisions via Submodular Regularization (LEASURE)
Open Source Code No The paper does not include an unambiguous statement where the authors state they are releasing the code for the work described in this paper, nor does it provide a direct link to a source-code repository.
Open Datasets Yes Our dataset is the subset of the Mushroom dataset (Lim, 2015), consisting of 1000 sets. Each set contains 23 mushroom species, and there are a total of 119 species. In this experiment, the set S is the Fashion-MNIST dataset consisting of greyscale images from one of 10 clothes classes (Xiao et al. (2017)). By employing a large protein engineering database containing mutation-function data (Wang et al., 2019), we demonstrate that LEASURE enables the learning of an optimal policy for imitating expert design of protein sequences (see Appendix for detailed discussion of datasets).
Dataset Splits Yes For training, we used an initially unlabelled dataset with 60000 images, 2000 of which were set aside to use for evaluating validation accuracy.
Hardware Specification Yes The training of a policy using these settings takes 36 hours on a modern multiprocessor computer equipped with an NVIDIA Titan V GPU.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes For both tasks, the score networks are trained using ADAM with a learning rate of 1e-3. Beta parameter from Line 2 in LEASURE was picked randomly to be 4/5. From experiments, the exact value of the parameter did not matter as long as it starts with at least 1/2 and degrades towards almost 0 after N iterations. The λ and γ parameters were picked using a hyperparameter sweep in log space.