reproducibilityindex.ai

Fine-Tuning Large Vision-Language Models as Decision-Making Agents via Reinforcement Learning

Authors: Simon Zhai, Hao Bai, Zipeng Lin, Jiayi Pan, Peter Tong, Yifei Zhou, Alane Suhr, Saining Xie, Yann LeCun, Yi Ma, Sergey Levine

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we demonstrate that our proposed framework enhances the decision-making capabilities of VLM agents across various tasks, enabling 7b models to outperform commercial models such as GPT4-V or Gemini.
Researcher Affiliation	Academia	1UC Berkeley 2UIUC 3NYU
Pseudocode	Yes	Algorithm 1 Training VLM with RL
Open Source Code	Yes	Project page: https://rl4vlm.github.io/ Our supplementary materials contain all of our codes, and we have provided a detailed readme.md file in the supplementary for reproducing our experiments.
Open Datasets	Yes	We have prepared our own data for the supervised fine-tuning phase. And we have anonymized the dataset for reproduction in the supplementary as well.
Dataset Splits	No	The paper does not explicitly provide percentages or sample counts for training/validation/test splits for the datasets used in its experiments.
Hardware Specification	Yes	All experiments are conducted on an 8 A100s DGX machine (80G), while the maximum VRAM requirement is < 40G.
Software Dependencies	No	The paper mentions software like Deep Speed [51], PPO [27], and RoBERTa-base [36] but does not provide specific version numbers for these software dependencies, which are required for a reproducible description.
Experiment Setup	Yes	For the Co T coefficient λ, we set λ = 0.5 in the gym_cards domain and λ = 0.2 in alfworld. The learning rate decay happens after every PPO update, which consists of 4 epochs of gradient updates with PPO. The number of data for on-policy training and batch size is task-dependent, we list them below. For one PPO update on each GPU, we collect 512 transitions, with a batch size of 128 per GPU (batch size = 512 in total).