reproducibilityindex.ai

Pix2seq: A Language Modeling Framework for Object Detection

Authors: Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the proposed method on the MS-COCO 2017 detection dataset (Lin et al., 2014), containing 118k training images and 5k validation images. To compare with DETR and Faster R-CNN, we report average precision (AP), an integral metric over multiple thresholds, on validation set at the last training epoch.
Researcher Affiliation	Industry	Ting Chen, Saurabh Saxena, Lala Li, David J. Fleet, Geoffrey Hinton Google Research, Brain Team Correspondence to: iamtingchen@google.com
Pseudocode	Yes	Algorithm 1 Quantization of (normalized) coordinates
Open Source Code	Yes	Code and checkpoints available at https://github.com/google-research/pix2seq.
Open Datasets	Yes	We evaluate the proposed method on the MS-COCO 2017 detection dataset (Lin et al., 2014)...
Dataset Splits	Yes	We evaluate the proposed method on the MS-COCO 2017 detection dataset (Lin et al., 2014), containing 118k training images and 5k validation images.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud compute instances) used for training or running experiments.
Software Dependencies	No	The paper mentions common deep learning components like ResNet and Transformer architectures but does not specify version numbers for any software, libraries, or frameworks used in the experiments (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	For training from scratch, we follow (Carion et al., 2020) using a Res Net backbone (He et al., 2016), followed by 6 layers of transformer encoder and 6 layers of (causal) transformer decoder (Vaswani et al., 2017)... We resize images (with a ﬁxed aspect ratio) so the longer side is 1333 pixels. For sequence construction, we use 2000 quantization bins... The model is trained for 300 epochs with a batch size of 128... We use Adam W optimizer (Kingma & Ba, 2014; Loshchilov & Hutter, 2018) with a learning rate of 0.003 and weight decay of 0.05.