Joint Modeling of Visual Objects and Relations for Scene Graph Generation
Authors: Minghao Xu, Meng Qu, Bingbing Ni, Jian Tang
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on both the relationship retrieval and zero-shot relationship retrieval tasks prove the efficiency and efficacy of our proposed approach. |
| Researcher Affiliation | Academia | 1Shanghai Jiao Tong University, Shanghai 200240, China 2Mila Québec AI Institute 3University of Montréal 4HEC Montréal 5CIFAR AI Research Chair {xuminghao118, nibingbing}@sjtu.edu.cn meng.qu@umontreal.ca jian.tang@hec.ca |
| Pseudocode | Yes | Algorithm 1 Inference algorithm of JM-SGG. |
| Open Source Code | No | Our method is implemented under Py Torch [25], and the source code will be released for reproducibility. |
| Open Datasets | Yes | We use the Visual Genome (VG) dataset [16] (CC BY 4.0 License), a large-scale database with structured image concepts, for evaluation. We use the pre-processed VG from Xu et al. [48] (MIT License) which contains 108k images with 150 object categories and 50 relation types. |
| Dataset Splits | Yes | Following previous works [53, 36, 37], we employ the original split with 70% images for training and 30% images for test, and 5k images randomly sampled from the training split are held out for validation. |
| Hardware Specification | Yes | An NVIDIA Tesla V100 GPU is used for training. |
| Software Dependencies | No | Our method is implemented under Py Torch [25]. The paper mentions PyTorch but does not specify a version number or other software dependencies with version information. |
| Experiment Setup | Yes | In our experiments, the object detector is first pre-trained by an SGD optimizer (batch size: 4, initial learning rate: 0.001, momentum: 0.9, weight decay: 5 10 4) for 20 epochs, and the learning rate is multiplied by 0.1 after the 10th epoch. During maximum likelihood learning, we train the potential functions and fine-tune the object detector with another SGD optimizer (batch size: 4, potential function learning rate: 0.001, detector learning rate: 0.0001, momentum: 0.9, weight decay: 5 10 4) for 10 epochs, and the learning rate is multiplied by 0.1 after the 5th epoch. Without otherwise specified, the iteration number NT is set as 1 for training and 2 for test, and the per image sampling size NS is set as 3. |