Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

Authors: Roei Herzig, Moshiko Raboh, Gal Chechik, Jonathan Berant, Amir Globerson

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate our approach, we first demonstrate on a synthetic dataset that respecting permutation invariance is important, because models that violate this invariance need more training data, despite having a comparable model size. Then, we tackle the problem of scene graph generation. We describe a model that satisfies the permutation invariance property, and show that it achieves state-of-the-art results on the competitive Visual Genome benchmark [15], demonstrating the power of our new design principle.
Researcher Affiliation Collaboration Roei Herzig Tel Aviv University roeiherzig@mail.tau.ac.il Moshiko Raboh Tel Aviv University mosheraboh@mail.tau.ac.il Gal Chechik Bar-Ilan University, NVIDIA Research gal.chechik@biu.ac.il Jonathan Berant Tel Aviv University, AI2 joberant@cs.tau.ac.il Amir Globerson Tel Aviv University gamir@post.tau.ac.il
Pseudocode No The paper includes mathematical equations and a schematic diagram (Figure 2) to describe the architecture, but it does not provide any structured pseudocode or an algorithm block.
Open Source Code Yes The full code is available at https://github.com/shikorab/SceneGraph
Open Datasets Yes We evaluated our approach on Visual Genome (VG) [15], a dataset with 108,077 images annotated with bounding boxes, entities and relations. ... [15] Ranjay Krishna, Yuke Zhu, Oliver Groth, Justin Johnson, Kenji Hata, Joshua Kravitz, Stephanie Chen, Yannis Kalantidis, Li-Jia Li, David A Shamma, et al. Visual genome: Connecting language and vision using crowdsourced dense image annotations. International Journal of Computer Vision, 123(1):32 73, 2017.
Dataset Splits Yes To tune hyper-parameters, we also split the training data into two by randomly selecting 5K examples, resulting in a final 70K/5K/32K split for train/validation/test sets.
Hardware Specification No The paper does not specify the hardware used for running the experiments, such as specific GPU models, CPU types, or memory.
Software Dependencies No The paper mentions training 'using Adam [14]' but does not provide specific software or library names with version numbers (e.g., Python, PyTorch, TensorFlow versions, or other dependencies).
Experiment Setup Yes All networks were trained using Adam [14] with batch size 20. ... In the loss, we penalized entities 4 times more strongly than relations, and penalized negative relations 10 times more weakly than positive relations. ... The φ and α networks were each implemented as a single fully-connected (FC) layer with a 500-dimensional outputs. ρ was implemented as a FC network with 3 500dimensional hidden layers, with one 150-dimensional output for the entity probabilities, and one 51-dimensional output for relation probabilities.