Adaptive Self-training Framework for Fine-grained Scene Graph Generation

Authors: Kibum Kim, Kanghoon Yoon, Yeonjun In, Jinyoung Moon, Donghyun Kim, Chanyoung Park

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our extensive experiments verify the effectiveness of ST-SGG on various SGG models, particularly in enhancing the performance on fine-grained predicate classes. Our code is available on https://github.com/rlqja1107/torch-ST-SGG
Researcher Affiliation Academia Kibum Kim1 Kanghoon Yoon1 Yeonjun In1 Jinyoung Moon2 Donghyun Kim3 Chanyoung Park1 1KAIST 2ETRI 3Korea University {kb.kim,ykhoon08,yeonjun.in,cy.park}@kaist.ac.kr jymoon@etri.re.kr, d_kim@korea.ac.kr
Pseudocode Yes For better understanding of ST-SGG, we provide the details of the procedure in Algorithm 1 and Algorithm 2.
Open Source Code Yes Our code is available on https://github.com/rlqja1107/torch-ST-SGG
Open Datasets Yes Through extensive experiments on VG and Open Images V6 (OI-V6), we verify that ST-SGG is effective when applied to existing SGG models, particularly enhancing the performance on fine-grained predicate classes.
Dataset Splits Yes The Visual Genome dataset comprising 108K images is divided into a 70% training set and a 30% test set, with 5K images from the training set utilized for the validation set. After preprocessing, OI-V6 is split into 126,368 train images, 1,813 validation images, and 6,322 test images
Hardware Specification Yes For each experiment, we used the A6000 GPU device.
Software Dependencies No The paper mentions using "Faster R-CNN (Ren et al., 2015) with Res Ne Xt-101-FPN (Xie et al., 2017) backbone network" as the object detector, but does not provide specific version numbers for software components like Python, PyTorch, TensorFlow, or other libraries.
Experiment Setup Yes In SGDet task, we select the top 80 entity proposals sorted by scores computed by object detector, and use per-class non-maximal suppression (NMS) at Io U 0.5. For ST-SGG, we conduct a grid search for the rate in momentum αinc and αdec with an interval of 0.2 (Sec. 4.2), and set the coefficient of the loss for pseudo-labeled predicates β to 1.0 (Equation 1) in the base SGG model. On the other hand, we set the β to 0.1 when adopting the re-weight loss in Appendix E.4. We set the initial value of τ (t) c for all predicate classes to 0 based on our observation that the performance of ST-SGG is not sensitive to the initial value. We set the maximum number of pseudo-labeled instances per class to 3 in an image. For the graph structure learner, we set the temperature τ to 0.5 and γ in focal loss to 2.0.