reproducibilityindex.ai

Tuning Computer Vision Models With Task Rewards

Authors: André Susano Pinto, Alexander Kolesnikov, Yuge Shi, Lucas Beyer, Xiaohua Zhai

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate some of our key results in Figure 1, highlighting both quantitative and qualitative improvements brought by reward optimization for object detection, panoptic segmentation, and image colorization. We believe this approach has the potential to be widely useful for better aligning models with a diverse range of computer vision tasks.
Researcher Affiliation	Collaboration	1Google DeepMind, Zurich, Switzerland 2Work done during internship while being a PhD student at the University of Oxford. Correspondence to: Andr´e Susano Pinto <andresp@google.com>, Alexander Kolesnikov <akolesnikov@google.com>.
Pseudocode	Yes	Algorithm 1 MLE optimization step and Algorithm 2 Reward optimization step.
Open Source Code	Yes	We use the big vision codebase (Beyer et al., 2022) for all experiments in this project. (Reference: Beyer, L., Zhai, X., and Kolesnikov, A. Big Vision. https://github.com/google-research/ big_vision, 2022.)
Open Datasets	Yes	We pretrain the model on the Objects365 dataset (Shao et al., 2019) and further ﬁnetune on the COCO (Lin et al., 2014) dataset. The model was trained using MLE on COCO panoptic dataset.
Dataset Splits	Yes	Table 1. Panoptic segmentation results on COCO panoptic validation set after reward optimization. (and) Analysis of reward distribution before and after tuning the model for the image captioning task. Measured as mean of 1024 validation examples.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for the experiments.
Software Dependencies	No	The paper mentions software components like 'Adafactor' and 'BERT' and the 'big vision codebase', but does not provide specific version numbers for general software dependencies (e.g., Python, PyTorch, TensorFlow, CUDA).
Experiment Setup	Yes	Panoptic segmentation: We use REINFORCE rule to tune the MLE model for this reward, with a batch size of 128 for 30k steps with constant learning rate 10^-6 after a warmup of 4k steps. (and) Appendix A provides detailed tables, e.g., 'Table 6. Object detection settings. MLE Objects365 pretraining RESOLUTION: 640x640 BATCH SIZE: 256 LEARNING-RATE: 1e-3 WEIGHT-DECAY: 5e-5 SCHEDULE: COSINE TOTAL STEPS: 400 000 WARMUP STEPS: 20 000'.