Learning Visual Prior via Generative Pre-Training

Authors: Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments
Researcher Affiliation Collaboration Jinheng Xie1 Kai Ye2 Yudong Li2 Yuexiang Li3 Kevin Qinghong Lin1 Yefeng Zheng3 Linlin Shen2 Mike Zheng Shou1 1 Show Lab, National University of Singapore 2 Shenzhen University 3 Jarvis Research Center, Tencent You Tu Lab
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No https://sierkinhane.github.io/visor-gpt is provided, but the paper does not explicitly state that source code for the methodology is available there, nor is it a direct link to a code repository.
Open Datasets Yes Datasets. We collect around 4 million sequences from the publicly available datasets for VISORGPT. In particular, we consider three types of commonly used visual annotations, i.e., object bounding-box, human pose, and instance mask. In the MS-COCO dataset [23]... Beyond that, ~3.5 million bounding-box annotations of Objects365 [38] and Open Images [17] are also converted to sequences.
Dataset Splits Yes Evaluating the learned probabilistic prior, i.e., object location, shape, and relation among categories, on the val set of COCO, Objects365, and Open Images datasets.
Hardware Specification Yes All experimental evaluations were conducted on eight NVIDIA Tesla V100-32GB GPUs using PyTorch.
Software Dependencies No The paper mentions 'PyTorch' and 'Deep Speed framework' but does not provide specific version numbers for these software dependencies.
Experiment Setup Yes We provide training details of VISORGPT in Tab. 3. VISORGPT adopted GPT-2 (base) architecture and was trained from scratch. ... Batch size 128 Iterations 200K Learning rate 5.0e-5 Sequence length n 1024