Learning to Generate an Unbiased Scene Graph by Using Attribute-Guided Predicate Features
Authors: Lei Wang, Zejian Yuan, Badong Chen
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical results show that our method is substantially improved on all benchmarks and achieves new state-of-the-art performance for unbiased scene graph generation. Experiments Experimental Settings Dataset Following previous works (Zellers et al. 2018; Tang et al. 2019; Yu et al. 2020; Li et al. 2021), the proposed method and recent methods are evaluated on the widely used subset of Visual Genome dataset (i.e., VG150) (Krishna et al. 2017), which includes the most frequent 150 object classes and 50 predicate classes. |
| Researcher Affiliation | Academia | Institute of Artificial Intelligence and Robotics, Xi an Jiaotong University, Xi an, China leiwangmail@stu.xjtu.edu.cn, {yuan.ze.jian, chenbd}@mail.xjtu.edu.cn |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/wanglei0618/A-PFG. |
| Open Datasets | Yes | Following previous works (Zellers et al. 2018; Tang et al. 2019; Yu et al. 2020; Li et al. 2021), the proposed method and recent methods are evaluated on the widely used subset of Visual Genome dataset (i.e., VG150) (Krishna et al. 2017) |
| Dataset Splits | Yes | Then, we divide it into 70% training set, 30% testing set, and 5k images selected from the training set for validation. |
| Hardware Specification | Yes | The PFRL is implemented on two NVIDIA 3090 GPUs with batch size 16 and learning rate 0.001 |
| Software Dependencies | No | The paper mentions using a pre-trained Faster RCNN and a pre-trained Glove language model, but does not provide specific version numbers for underlying software libraries like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | For the PFRL model, the number of object encoder layers and predicate encoder layers are 4 and 2, respectively, the dimension of the predicate feature is 1024. For classifier fine-tuning, the number of instances for each predicate class Np is 5000, and the number of background features Nb is 5 106. For the A-PFG model, the encoders and decoders are 3-layers fully-connected network, with each layer followed by the Leaky Re Lu activation function, the dimension of the predicate attribute embedding is 1024, and the dimensions of the latent variables zr and za are 256. The hyperparameter γ is 1, β and δ are increased 0.5 per epoch. The PFRL is implemented on two NVIDIA 3090 GPUs with batch size 16 and learning rate 0.001, and the classifier is fine-tuned with batch size 16 and learning rate 2 10 6. The A-PFG model is trained for 200 epochs with batch size 64 and learning rate 2 10 4. |