Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency
Authors: Hyeongjin Kim, Sangwon Kim, Dasom Ahn, Jong Taek Lee, Byoung Chul Ko
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We applied the proposed model to the SGG benchmark dataset, and the results showed a performance improvement of up to 3.8% compared with existing state-of-the-art models in SGGen subtask. The proposed method exhibits generalization ability from the results obtained, showing uniform performance improvement for all MPNN models. |
| Researcher Affiliation | Collaboration | 1Department of Computer Engineering, Keimyung University, Daegu, South Korea 2Electronics and Telecommunications Research Institute (ETRI), Daegu 42994, South Korea 3Department of Computer Engineering, Kyungpook National University, Daegu, South Korea. |
| Pseudocode | Yes | Algorithm 1 Processing of the TF-l-IDF layer |
| Open Source Code | No | The paper references 'Py SGG (Tang, 2020) for detailed training environments of additional models' and provides a URL to a general benchmark framework (https://github.com/Kaihua Tang/ Scene-Graph-Benchmark.pytorch), but does not explicitly state that the authors' own source code for the proposed Coo K + TF-l-IDF method is open source or provide a link to it. |
| Open Datasets | Yes | To verify the performance of the proposed Coo K + TF-l IDF method, experiments were conducted on the following two datasets: Visual Genome (Xu et al., 2017b) and Open Images (Kuznetsova et al., 2020). |
| Dataset Splits | Yes | The dataset [Visual Genome] was divided into 70% training data and 30% test data... A total of 126,368 images were used for training; 1,813 and 5,322 images were used for validation and testing, respectively [Open Images]. |
| Hardware Specification | Yes | All of the experiments were conducted on a private machine equipped with two Intel(R) Xeon(R) CPUs, that is, a Gold 6230R CPU @ 2.10 GHz; 128 GB RAM, and an NVIDIA RTX 3090 GPU. |
| Software Dependencies | No | The paper mentions using 'GloVe (Pennington et al., 2014)' for word embedding and refers to 'Py SGG (Tang, 2020)' for training environments, but does not specify particular software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions) that are directly attributable to their method's implementation. |
| Experiment Setup | Yes | Table 6 shows the model configurations on each benchmark dataset. LR 0.008, LR decay Warmup Multi Step LR, Weight decay 5 10 5, Iteration 49,500, Batch size 12 9 9, Num layers 4, Object dim 128, Relation dim 128 |