Deep Submodular Functions: Definitions and Learning
Authors: Brian W. Dolhansky, Jeff A. Bilmes
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We offer preliminary feasibility results showing it is possible to train a DSF on synthetic datasets and, via featurization, on a real image summarization dataset. |
| Researcher Affiliation | Academia | Brian Dolhansky <bdol@cs.washington.edu> Jeff Bilmes <bilmes@uw.edu> Dept. of Computer Science and Engineering University of Washington Seattle, WA 98105 Dept. of Electrical Engineering University of Washington Seattle, WA 98105 |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-sourcing of the described methodology's code. |
| Open Datasets | Yes | For our real-world instance of learning DSFs, we use the dataset of [27], which consists of 14 distinct image sets, 100 images each. [27] is: S. Tschiatschek, R. Iyer, H. Wei, and J. Bilmes. Learning mixtures of submodular functions for image collection summarization. In Neural Information Processing Society (NIPS), Montreal, Canada, December 2014. |
| Dataset Splits | No | The paper mentions training on 13 sets and testing on one, but does not explicitly specify a separate validation split or how validation was performed within the training process. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "Adagrad" but does not specify software names with version numbers for libraries, frameworks, or programming languages. |
| Experiment Setup | Yes | We used a simple two-layer DSF, where the first hidden layer consisted of four hidden units with square root activation functions, and a normalized sigmoid ˆσ(x) = 2 (σ(x) 0.5) at the output. A DSF is trained with a hidden layer of 10 units of activation g(x) = max(x, 1), and a normalized sigmoid ˆσ at the output. We used (diagonalized) Adagrad, a decaying learning rate, weight decay, and dropout (which was critical for test-set performance). |