reproducibilityindex.ai

FuRL: Visual-Language Models as Fuzzy Rewards for Reinforcement Learning

Authors: Yuwei Fu, Haichao Zhang, Di Wu, Wei Xu, Benoit Boulet

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on the Meta-world benchmark tasks demonstrate the efficacy of the proposed method. Code is available at: https://github.com/fuyw/Fu RL.
Researcher Affiliation	Collaboration	Yuwei Fu 1 2 Haichao Zhang 2 Di Wu 1 Wei Xu 2 Benoit Boulet 1 1Mc Gill University 2Horizon Robotics.
Pseudocode	Yes	Algorithm 1 Fuzzy VLM rewards aided RL (Fu RL)
Open Source Code	Yes	Code is available at: https://github.com/fuyw/Fu RL.
Open Datasets	Yes	We use ten robotics tasks from the Meta-world MT10 environment (Yu et al., 2020) with state-based observations and sparse rewards (referred to as Sparse Meta-world Tasks).
Dataset Splits	Yes	We report the average success rate P (%) in the evaluation at the last timestep across 5 random seeds after training.
Hardware Specification	Yes	We run our experiments on a workstation with NVIDIA Ge Force RTX 3090 GPU and a 12th Gen Intel(R) Core(TM) i9-12900KF CPU.
Software Dependencies	Yes	In the experiment, we re-implement the SAC (Haarnoja et al., 2018) and Dr Q (Yarats et al., 2021) baseline RL agents in JAX (Frostig et al., 2018). For the VLM model, we use the provided Py Torch code (Imambi et al., 2021) and checkpoint for both of LIV and CLIP from the official LIV codebase2. In the experiments, we use the latest Meta-world environment 3. For the other main softwares, we use the following versions: jaxlib-0.4.16+cuda12.cudnn89-cp39 gymnasium 0.29.1 imageio 2.33.1 optax 0.1.7 torch 2.1.2 torchvision 0.16.2 numpy 1.26.2
Experiment Setup	Yes	The total environmental step is 1e6. We use the Adam optimizer with a learning rate of 0.0001. The VLM reward weight ρ is 0.05. For the VLM model, we use the pre-trained LIV (Ma et al., 2023a) from the official implementation.