reproducibilityindex.ai

Deep Submodular Functions: Definitions and Learning

Authors: Brian W. Dolhansky, Jeff A. Bilmes

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We offer preliminary feasibility results showing it is possible to train a DSF on synthetic datasets and, via featurization, on a real image summarization dataset.
Researcher Affiliation	Academia	Brian Dolhansky <bdol@cs.washington.edu> Jeff Bilmes <bilmes@uw.edu> Dept. of Computer Science and Engineering University of Washington Seattle, WA 98105 Dept. of Electrical Engineering University of Washington Seattle, WA 98105
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the described methodology's code.
Open Datasets	Yes	For our real-world instance of learning DSFs, we use the dataset of [27], which consists of 14 distinct image sets, 100 images each. [27] is: S. Tschiatschek, R. Iyer, H. Wei, and J. Bilmes. Learning mixtures of submodular functions for image collection summarization. In Neural Information Processing Society (NIPS), Montreal, Canada, December 2014.
Dataset Splits	No	The paper mentions training on 13 sets and testing on one, but does not explicitly specify a separate validation split or how validation was performed within the training process.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using "Adagrad" but does not specify software names with version numbers for libraries, frameworks, or programming languages.
Experiment Setup	Yes	We used a simple two-layer DSF, where the ﬁrst hidden layer consisted of four hidden units with square root activation functions, and a normalized sigmoid ˆσ(x) = 2 (σ(x) 0.5) at the output. A DSF is trained with a hidden layer of 10 units of activation g(x) = max(x, 1), and a normalized sigmoid ˆσ at the output. We used (diagonalized) Adagrad, a decaying learning rate, weight decay, and dropout (which was critical for test-set performance).