Deep Submodular Functions: Definitions and Learning

Authors: Brian W. Dolhansky, Jeff A. Bilmes

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We offer preliminary feasibility results showing it is possible to train a DSF on synthetic datasets and, via featurization, on a real image summarization dataset.
Researcher Affiliation Academia Brian Dolhansky <bdol@cs.washington.edu> Jeff Bilmes <bilmes@uw.edu> Dept. of Computer Science and Engineering University of Washington Seattle, WA 98105 Dept. of Electrical Engineering University of Washington Seattle, WA 98105
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the described methodology's code.
Open Datasets Yes For our real-world instance of learning DSFs, we use the dataset of [27], which consists of 14 distinct image sets, 100 images each. [27] is: S. Tschiatschek, R. Iyer, H. Wei, and J. Bilmes. Learning mixtures of submodular functions for image collection summarization. In Neural Information Processing Society (NIPS), Montreal, Canada, December 2014.
Dataset Splits No The paper mentions training on 13 sets and testing on one, but does not explicitly specify a separate validation split or how validation was performed within the training process.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using "Adagrad" but does not specify software names with version numbers for libraries, frameworks, or programming languages.
Experiment Setup Yes We used a simple two-layer DSF, where the first hidden layer consisted of four hidden units with square root activation functions, and a normalized sigmoid ˆσ(x) = 2 (σ(x) 0.5) at the output. A DSF is trained with a hidden layer of 10 units of activation g(x) = max(x, 1), and a normalized sigmoid ˆσ at the output. We used (diagonalized) Adagrad, a decaying learning rate, weight decay, and dropout (which was critical for test-set performance).