Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bridging Environments and Language with Rendering Functions and Vision-Language Models
Authors: Theo Cachet, Christopher R Dance, Olivier Sigaud
ICML 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed methods on the Humanoid environment, showing that it results in LCAs that outperform MTRL baselines in zero-shot generalization, without requiring any textual task descriptions or other forms of environment-specific annotation during training. |
| Researcher Affiliation | Collaboration | 1NAVER LABS Europe, Meylan 2Institute of Intelligent Systems and Robotics, Sorbonne University, Paris. |
| Pseudocode | Yes | Algorithm 1 Gradient-based configuration finetuning |
| Open Source Code | No | The paper provides a link to an interactive demo and videos (https://europe.naverlabs.com/text2control), but it does not explicitly state that the source code for the methodology is open-source or available via a repository link. |
| Open Datasets | Yes | We evaluate our approach on the Humanoid environment from Open AI s Gym framework (Brockman et al., 2016)... Large-scale internet-scraped text and image data is a key enabler of current LLMs and text-to-image models (Schuhmann et al., 2022; Gadre et al., 2023; Penedo et al., 2023). |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly specify a validation dataset split or provide details for how data was partitioned for validation purposes in its own experiments. |
| Hardware Specification | Yes | using an NVIDIA RTX A6000 GPU and a 40-core Intel Xeon w7-2475X |
| Software Dependencies | Yes | We use the Humanoid environment from Open AI s Gym framework (Brockman et al., 2016)...Rendering is performed using Mu Jo Co rendering functions...Mu Jo Co > 2.0.3 |
| Experiment Setup | Yes | Table 6. Hyperparameters used when training the STRL, MTRL and GCRL agents with PPO. Hyperparameter Value Clipping 0.2 Discount factor, γ 0.999 GAE parameter, λ 0.95 Update time-step 204 800 (MTRL and GCRL), 25 600 (STRL) Batch size 102 400 (MTRL and GCRL), 12 800 (STRL) Epochs 10 Learning rate 5e-4 Learning-rate schedule linear annealing Gradient norm clipping 0.5 Value clipping no Entropy coefficient 2.5e-2 Value coefficient 0.5 Activation function Ge LU (Hendrycks & Gimpel, 2016) Optimizer Adam W (Loshchilov & Hutter, 2017) Weight decay 0.01 |