Can Pre-Trained Text-to-Image Models Generate Visual Goals for Reinforcement Learning?

Authors: Jialu Gao, Kaizhe Hu, Guowei Xu, Huazhe Xu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate Lf Void across three simulated tasks and validate its feasibility in the corresponding realworld scenarios.
Researcher Affiliation Academia Jialu Gao1 , Kaizhe Hu1,2,3 , Guowei Xu1, Huazhe Xu1,2,3 1 Tsinghua University 2 Shanghai Qi Zhi Institute 3 Shanghai AI Lab
Pseudocode Yes we provide a detailed description of the full algorithm in Appendix A.2.
Open Source Code No Our project page: Lf Void.github.io. This is a project page, not an explicit statement that the code is available or a direct link to a code repository.
Open Datasets Yes The simulated tasks are developed based on the Robosuite benchmark, while we provide corresponding real-world tasks for each environment. A full description of the environments is provided in Appendix B.1.
Dataset Splits No The paper mentions creating a 'dataset of 1024 target images for each task' and training a discriminator using positive and negative samples, but it does not specify explicit train/validation/test splits with percentages or counts for the overall models or experiments.
Hardware Specification No The paper mentions 'simulated tasks' and 'real-robot environments' but does not provide specific hardware details such as GPU models, CPU specifications, or memory used for running the experiments.
Software Dependencies No The paper mentions using 'Dr Q-v2 [45] for visual RL training' and that simulated tasks are based on the 'Robosuite benchmark [51]', but it does not provide specific version numbers for these or other software components.
Experiment Setup No The paper states, 'The training details can be found in Appendix B.3.', indicating that detailed experimental setup information, such as hyperparameters, is not in the main text.