Adaptive Self-training Framework for Fine-grained Scene Graph Generation
Authors: Kibum Kim, Kanghoon Yoon, Yeonjun In, Jinyoung Moon, Donghyun Kim, Chanyoung Park
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes. Our code is available on https://github.com/rlqja1107/torch-ST-SGG |
| Researcher Affiliation | Academia | Kibum Kim1 Kanghoon Yoon1 Yeonjun In1 Jinyoung Moon2 Donghyun Kim3 Chanyoung Park1 1KAIST 2ETRI 3Korea University {kb.kim,ykhoon08,yeonjun.in,cy.park}@kaist.ac.kr jymoon@etri.re.kr, d_kim@korea.ac.kr |
| Pseudocode | Yes | For better understanding of ST-SGG, we provide the details of the procedure in Algorithm 1 and Algorithm 2. |
| Open Source Code | Yes | Our code is available on https://github.com/rlqja1107/torch-ST-SGG |
| Open Datasets | Yes | Through extensive experiments on VG and Open Images V6 (OI-V6), we verify that ST-SGG is effective when applied to existing SGG models, particularly enhancing the performance on fine-grained predicate classes. |
| Dataset Splits | Yes | The Visual Genome dataset comprising 108K images is divided into a 70% training set and a 30% test set, with 5K images from the training set utilized for the validation set. After preprocessing, OI-V6 is split into 126,368 train images, 1,813 validation images, and 6,322 test images |
| Hardware Specification | Yes | For each experiment, we used the A6000 GPU device. |
| Software Dependencies | No | The paper mentions using "Faster R-CNN (Ren et al., 2015) with Res Ne Xt-101-FPN (Xie et al., 2017) backbone network" as the object detector, but does not provide specific version numbers for software components like Python, PyTorch, TensorFlow, or other libraries. |
| Experiment Setup | Yes | In SGDet task, we select the top 80 entity proposals sorted by scores computed by object detector, and use per-class non-maximal suppression (NMS) at Io U 0.5. For ST-SGG, we conduct a grid search for the rate in momentum αinc and αdec with an interval of 0.2 (Sec. 4.2), and set the coefficient of the loss for pseudo-labeled predicates β to 1.0 (Equation 1) in the base SGG model. On the other hand, we set the β to 0.1 when adopting the re-weight loss in Appendix E.4. We set the initial value of τ (t) c for all predicate classes to 0 based on our observation that the performance of ST-SGG is not sensitive to the initial value. We set the maximum number of pseudo-labeled instances per class to 3 in an image. For the graph structure learner, we set the temperature τ to 0.5 and γ in focal loss to 2.0. |