reproducibilityindex.ai

Learning Visual Prior via Generative Pre-Training

Authors: Jinheng Xie, Kai Ye, Yudong Li, Yuexiang Li, Kevin Qinghong Lin, Yefeng Zheng, Linlin Shen, Mike Zheng Shou

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments
Researcher Affiliation	Collaboration	Jinheng Xie1 Kai Ye2 Yudong Li2 Yuexiang Li3 Kevin Qinghong Lin1 Yefeng Zheng3 Linlin Shen2 Mike Zheng Shou1 1 Show Lab, National University of Singapore 2 Shenzhen University 3 Jarvis Research Center, Tencent You Tu Lab
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	No	https://sierkinhane.github.io/visor-gpt is provided, but the paper does not explicitly state that source code for the methodology is available there, nor is it a direct link to a code repository.
Open Datasets	Yes	Datasets. We collect around 4 million sequences from the publicly available datasets for VISORGPT. In particular, we consider three types of commonly used visual annotations, i.e., object bounding-box, human pose, and instance mask. In the MS-COCO dataset [23]... Beyond that, ~3.5 million bounding-box annotations of Objects365 [38] and Open Images [17] are also converted to sequences.
Dataset Splits	Yes	Evaluating the learned probabilistic prior, i.e., object location, shape, and relation among categories, on the val set of COCO, Objects365, and Open Images datasets.
Hardware Specification	Yes	All experimental evaluations were conducted on eight NVIDIA Tesla V100-32GB GPUs using PyTorch.
Software Dependencies	No	The paper mentions 'PyTorch' and 'Deep Speed framework' but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We provide training details of VISORGPT in Tab. 3. VISORGPT adopted GPT-2 (base) architecture and was trained from scratch. ... Batch size 128 Iterations 200K Learning rate 5.0e-5 Sequence length n 1024