Learning Visual Prior via Generative Pre-Training
Authors: Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments |
| Researcher Affiliation | Collaboration | Jinheng Xie1 Kai Ye2 Yudong Li2 Yuexiang Li3 Kevin Qinghong Lin1 Yefeng Zheng3 Linlin Shen2 Mike Zheng Shou1 1 Show Lab, National University of Singapore 2 Shenzhen University 3 Jarvis Research Center, Tencent You Tu Lab |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | No | https://sierkinhane.github.io/visor-gpt is provided, but the paper does not explicitly state that source code for the methodology is available there, nor is it a direct link to a code repository. |
| Open Datasets | Yes | Datasets. We collect around 4 million sequences from the publicly available datasets for VISORGPT. In particular, we consider three types of commonly used visual annotations, i.e., object bounding-box, human pose, and instance mask. In the MS-COCO dataset [23]... Beyond that, ~3.5 million bounding-box annotations of Objects365 [38] and Open Images [17] are also converted to sequences. |
| Dataset Splits | Yes | Evaluating the learned probabilistic prior, i.e., object location, shape, and relation among categories, on the val set of COCO, Objects365, and Open Images datasets. |
| Hardware Specification | Yes | All experimental evaluations were conducted on eight NVIDIA Tesla V100-32GB GPUs using PyTorch. |
| Software Dependencies | No | The paper mentions 'PyTorch' and 'Deep Speed framework' but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We provide training details of VISORGPT in Tab. 3. VISORGPT adopted GPT-2 (base) architecture and was trained from scratch. ... Batch size 128 Iterations 200K Learning rate 5.0e-5 Sequence length n 1024 |