Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction
Authors: Roei Herzig, Moshiko Raboh, Gal Chechik, Jonathan Berant, Amir Globerson
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our approach, we first demonstrate on a synthetic dataset that respecting permutation invariance is important, because models that violate this invariance need more training data, despite having a comparable model size. Then, we tackle the problem of scene graph generation. We describe a model that satisfies the permutation invariance property, and show that it achieves state-of-the-art results on the competitive Visual Genome benchmark [15], demonstrating the power of our new design principle. |
| Researcher Affiliation | Collaboration | Roei Herzig Tel Aviv University roeiherzig@mail.tau.ac.il Moshiko Raboh Tel Aviv University mosheraboh@mail.tau.ac.il Gal Chechik Bar-Ilan University, NVIDIA Research gal.chechik@biu.ac.il Jonathan Berant Tel Aviv University, AI2 joberant@cs.tau.ac.il Amir Globerson Tel Aviv University gamir@post.tau.ac.il |
| Pseudocode | No | The paper includes mathematical equations and a schematic diagram (Figure 2) to describe the architecture, but it does not provide any structured pseudocode or an algorithm block. |
| Open Source Code | Yes | The full code is available at https://github.com/shikorab/SceneGraph |
| Open Datasets | Yes | We evaluated our approach on Visual Genome (VG) [15], a dataset with 108,077 images annotated with bounding boxes, entities and relations. ... [15] Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1):32 73, 2017. |
| Dataset Splits | Yes | To tune hyper-parameters, we also split the training data into two by randomly selecting 5K examples, resulting in a final 70K/5K/32K split for train/validation/test sets. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments, such as specific GPU models, CPU types, or memory. |
| Software Dependencies | No | The paper mentions training 'using Adam [14]' but does not provide specific software or library names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other dependencies). |
| Experiment Setup | Yes | All networks were trained using Adam [14] with batch size 20. ... In the loss, we penalized entities 4 times more strongly than relations, and penalized negative relations 10 times more weakly than positive relations. ... The φ and α networks were each implemented as a single fully-connected (FC) layer with a 500-dimensional outputs. ρ was implemented as a FC network with 3 500dimensional hidden layers, with one 150-dimensional output for the entity probabilities, and one 51-dimensional output for relation probabilities. |