Predict and Constrain: Modeling Cardinality in Deep Structured Prediction

Authors: Nataly Brukhim, Amir Globerson

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach outperforms strong baselines, achieving state-of-the-art results on multi-label classification benchmarks.
Researcher Affiliation Academia Nataly Brukhim 1 Amir Globerson 1 1Tel Aviv University, Blavatnik School of Computer Science. Correspondence to: Nataly Brukhim <natalybr@mail.tau.ac.il>, Amir Globerson <gamir@post.tau.ac.il>.
Pseudocode Yes Algorithm 1 Soft projection onto the simplex
Open Source Code No The paper mentions that 'The E2E-SPEN results were obtained by running their publicly available code on these datasets.' This refers to a baseline method's code, not the code for the authors' own method.
Open Datasets Yes We use 3 standard MLC benchmarks, as used by other recent approaches (Belanger & Mc Callum 2016; Gygli et al. 2017; Amos & Kolter 2017): Bibtex, Delicious, and Bookmarks.
Dataset Splits No All of the hyperparameters were tuned on development data. While 'development data' often implies a validation set, the paper does not specify explicit split percentages or counts for validation data.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions using 'TensorFlow' but does not specify its version number or any other software dependencies with version numbers, which are required for full reproducibility.
Experiment Setup Yes For all neural networks we use a single hidden layer, with Re LU activations. For the unrolled optimization we used gradient ascent with momentum 0.9, unrolled for T iterations, with T ranging between 10 20, and with R = 2 alternating projection iterations. All of the hyperparameters were tuned on development data. We trained our network using Ada Grad (Duchi et al., 2011) with learning rate η = 0.1.