Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency

Authors: Hyeongjin Kim, Sangwon Kim, Dasom Ahn, Jong Taek Lee, Byoung Chul Ko

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We applied the proposed model to the SGG benchmark dataset, and the results showed a performance improvement of up to 3.8% compared with existing state-of-the-art models in SGGen subtask. The proposed method exhibits generalization ability from the results obtained, showing uniform performance improvement for all MPNN models.
Researcher Affiliation Collaboration 1Department of Computer Engineering, Keimyung University, Daegu, South Korea 2Electronics and Telecommunications Research Institute (ETRI), Daegu 42994, South Korea 3Department of Computer Engineering, Kyungpook National University, Daegu, South Korea.
Pseudocode Yes Algorithm 1 Processing of the TF-l-IDF layer
Open Source Code No The paper references 'Py SGG (Tang, 2020) for detailed training environments of additional models' and provides a URL to a general benchmark framework (https://github.com/Kaihua Tang/ Scene-Graph-Benchmark.pytorch), but does not explicitly state that the authors' own source code for the proposed Coo K + TF-l-IDF method is open source or provide a link to it.
Open Datasets Yes To verify the performance of the proposed Coo K + TF-l IDF method, experiments were conducted on the following two datasets: Visual Genome (Xu et al., 2017b) and Open Images (Kuznetsova et al., 2020).
Dataset Splits Yes The dataset [Visual Genome] was divided into 70% training data and 30% test data... A total of 126,368 images were used for training; 1,813 and 5,322 images were used for validation and testing, respectively [Open Images].
Hardware Specification Yes All of the experiments were conducted on a private machine equipped with two Intel(R) Xeon(R) CPUs, that is, a Gold 6230R CPU @ 2.10 GHz; 128 GB RAM, and an NVIDIA RTX 3090 GPU.
Software Dependencies No The paper mentions using 'GloVe (Pennington et al., 2014)' for word embedding and refers to 'Py SGG (Tang, 2020)' for training environments, but does not specify particular software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that are directly attributable to their method's implementation.
Experiment Setup Yes Table 6 shows the model configurations on each benchmark dataset. LR 0.008, LR decay Warmup Multi Step LR, Weight decay 5 10 5, Iteration 49,500, Batch size 12 9 9, Num layers 4, Object dim 128, Relation dim 128