Self-Supervised Relationship Probing
Authors: Jiuxiang Gu, Jason Kuen, Shafiq Joty, Jianfei Cai, Vlad Morariu, Handong Zhao, Tong Sun
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct extensive experiments to demonstrate that our method can beneļ¬t both vision and VL understanding tasks. |
| Researcher Affiliation | Collaboration | 1Adobe Research, 2Nanyang Technological University, 3Monash University |
| Pseudocode | No | The paper describes the model architecture and learning process using text and mathematical equations, but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | In contrast, we only aggregate pretraining data from the train (113k) and validation (5k) splits of MSCOCO [58]. |
| Dataset Splits | Yes | In contrast, we only aggregate pretraining data from the train (113k) and validation (5k) splits of MSCOCO [58]. |
| Hardware Specification | Yes | The training is carried out with four Tesla V100 GPUs with a batch size of 128 for 10 epochs. |
| Software Dependencies | No | The paper mentions several software components like "Faster-RCNN [46]", "Word Piece tokenizer [47]", "Adam optimizer [62]", and "Stanza [49]", but does not provide specific version numbers for these software dependencies, which are necessary for full reproducibility. |
| Experiment Setup | Yes | We set the numbers of layers for the intra-modality encoders of f S S Intra and f V V Intra to 9 and 5, respectively, and the number of layers for the inter-modality encoders of f V S Inter, f S V Inter , and f V S Inter to 5. For each transformer block, we set its hidden size to 768 and the number of heads to 12. To keep the sizes the same for the relationship matrices, the maximum numbers of words and objects are equally set to 36. ... At each iteration, we randomly mask input words and Ro Is with a probability of 0.15. ... We use Adam optimizer [62] with a linear learning-rate schedule [13] and a peak learning rate of 1e 4. The training is carried out with four Tesla V100 GPUs with a batch size of 128 for 10 epochs. ... All variants of SSRP are trained for 30 epochs with Adam, a batch size of 512, and a learning of 5e 5. |