Bridging Environments and Language with Rendering Functions and Vision-Language Models
Authors: Theo Cachet, Christopher R Dance, Olivier Sigaud
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed methods on the Humanoid environment, showing that it results in LCAs that outperform MTRL baselines in zero-shot generalization, without requiring any textual task descriptions or other forms of environment-specific annotation during training. |
| Researcher Affiliation | Collaboration | 1NAVER LABS Europe, Meylan 2Institute of Intelligent Systems and Robotics, Sorbonne University, Paris. |
| Pseudocode | Yes | Algorithm 1 Gradient-based configuration finetuning |
| Open Source Code | No | The paper provides a link to an interactive demo and videos (https://europe.naverlabs.com/text2control), but it does not explicitly state that the source code for the methodology is open-source or available via a repository link. |
| Open Datasets | Yes | We evaluate our approach on the Humanoid environment from Open AI s Gym framework (Brockman et al., 2016)... Large-scale internet-scraped text and image data is a key enabler of current LLMs and text-to-image models (Schuhmann et al., 2022; Gadre et al., 2023; Penedo et al., 2023). |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly specify a validation dataset split or provide details for how data was partitioned for validation purposes in its own experiments. |
| Hardware Specification | Yes | using an NVIDIA RTX A6000 GPU and a 40-core Intel Xeon w7-2475X |
| Software Dependencies | Yes | We use the Humanoid environment from Open AI s Gym framework (Brockman et al., 2016)...Rendering is performed using Mu Jo Co rendering functions...Mu Jo Co > 2.0.3 |
| Experiment Setup | Yes | Table 6. Hyperparameters used when training the STRL, MTRL and GCRL agents with PPO. Hyperparameter Value Clipping 0.2 Discount factor, γ 0.999 GAE parameter, λ 0.95 Update time-step 204 800 (MTRL and GCRL), 25 600 (STRL) Batch size 102 400 (MTRL and GCRL), 12 800 (STRL) Epochs 10 Learning rate 5e-4 Learning-rate schedule linear annealing Gradient norm clipping 0.5 Value clipping no Entropy coefficient 2.5e-2 Value coefficient 0.5 Activation function Ge LU (Hendrycks & Gimpel, 2016) Optimizer Adam W (Loshchilov & Hutter, 2017) Weight decay 0.01 |