Predict and Constrain: Modeling Cardinality in Deep Structured Prediction
Authors: Nataly Brukhim, Amir Globerson
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our approach outperforms strong baselines, achieving state-of-the-art results on multi-label classification benchmarks. |
| Researcher Affiliation | Academia | Nataly Brukhim 1 Amir Globerson 1 1Tel Aviv University, Blavatnik School of Computer Science. Correspondence to: Nataly Brukhim <natalybr@mail.tau.ac.il>, Amir Globerson <gamir@post.tau.ac.il>. |
| Pseudocode | Yes | Algorithm 1 Soft projection onto the simplex |
| Open Source Code | No | The paper mentions that 'The E2E-SPEN results were obtained by running their publicly available code on these datasets.' This refers to a baseline method's code, not the code for the authors' own method. |
| Open Datasets | Yes | We use 3 standard MLC benchmarks, as used by other recent approaches (Belanger & Mc Callum 2016; Gygli et al. 2017; Amos & Kolter 2017): Bibtex, Delicious, and Bookmarks. |
| Dataset Splits | No | All of the hyperparameters were tuned on development data. While 'development data' often implies a validation set, the paper does not specify explicit split percentages or counts for validation data. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'TensorFlow' but does not specify its version number or any other software dependencies with version numbers, which are required for full reproducibility. |
| Experiment Setup | Yes | For all neural networks we use a single hidden layer, with Re LU activations. For the unrolled optimization we used gradient ascent with momentum 0.9, unrolled for T iterations, with T ranging between 10 20, and with R = 2 alternating projection iterations. All of the hyperparameters were tuned on development data. We trained our network using Ada Grad (Duchi et al., 2011) with learning rate η = 0.1. |