Bridging Environments and Language with Rendering Functions and Vision-Language Models

Authors: Theo Cachet, Christopher R Dance, Olivier Sigaud

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate the proposed methods on the Humanoid environment, showing that it results in LCAs that outperform MTRL baselines in zero-shot generalization, without requiring any textual task descriptions or other forms of environment-specific annotation during training.
Researcher Affiliation Collaboration 1NAVER LABS Europe, Meylan 2Institute of Intelligent Systems and Robotics, Sorbonne University, Paris.
Pseudocode Yes Algorithm 1 Gradient-based configuration finetuning
Open Source Code No The paper provides a link to an interactive demo and videos (https://europe.naverlabs.com/text2control), but it does not explicitly state that the source code for the methodology is open-source or available via a repository link.
Open Datasets Yes We evaluate our approach on the Humanoid environment from Open AI s Gym framework (Brockman et al., 2016)... Large-scale internet-scraped text and image data is a key enabler of current LLMs and text-to-image models (Schuhmann et al., 2022; Gadre et al., 2023; Penedo et al., 2023).
Dataset Splits No The paper mentions training and test sets but does not explicitly specify a validation dataset split or provide details for how data was partitioned for validation purposes in its own experiments.
Hardware Specification Yes using an NVIDIA RTX A6000 GPU and a 40-core Intel Xeon w7-2475X
Software Dependencies Yes We use the Humanoid environment from Open AI s Gym framework (Brockman et al., 2016)...Rendering is performed using Mu Jo Co rendering functions...Mu Jo Co > 2.0.3
Experiment Setup Yes Table 6. Hyperparameters used when training the STRL, MTRL and GCRL agents with PPO. Hyperparameter Value Clipping 0.2 Discount factor, γ 0.999 GAE parameter, λ 0.95 Update time-step 204 800 (MTRL and GCRL), 25 600 (STRL) Batch size 102 400 (MTRL and GCRL), 12 800 (STRL) Epochs 10 Learning rate 5e-4 Learning-rate schedule linear annealing Gradient norm clipping 0.5 Value clipping no Entropy coefficient 2.5e-2 Value coefficient 0.5 Activation function Ge LU (Hendrycks & Gimpel, 2016) Optimizer Adam W (Loshchilov & Hutter, 2017) Weight decay 0.01