A Conservative Approach for Few-Shot Transfer in Off-Dynamics Reinforcement Learning
Authors: Paul Daoudi, Christophe Prieur, Bogdan Robu, Merwan Barlier, Ludovic Dos Santos
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of the FOOD algorithm in the off-dynamics setting in environments presenting different dynamics discrepancies, treated as black box simulators. The code can be found at https://github.com/ Paul Daoudi/FOOD. The environments are based on Open AI Gym [Brockman et al., 2016] and the Minitaur environment [Coumans and Bai, 2016 2021] where the target environment has been modified by various mechanisms. These include gravity, friction, and mass modifications, as well as broken joint(s) systems for which DARC is known to perform well [Eysenbach et al., 2021, Section 6]. We also add the Low Fidelity Minitaur environment, highlighted in previous works [Desai et al., 2020; Yu et al., 2018] as a classical benchmark for evaluating agents in the off-dynamics setting. |
| Researcher Affiliation | Collaboration | Paul Daoudi1 , Christophe Prieur2 , Bogdan Robu2 , Merwan Barlier1 and Ludovic Dos Santos3 1Huawei Noah s Ark Lab 2GIPSA Lab 3Criteo AI Lab {paul.daoudi1, merwan.barlier}@huawei.com, l.dossantos@criteo.com, {christophe.prieur, bogdan.robu}@gipsa-lab.grenoble-inp.fr |
| Pseudocode | Yes | Algorithm 1 Few-sh Ot Off Dynamics (FOOD). |
| Open Source Code | Yes | The code can be found at https://github.com/ Paul Daoudi/FOOD. |
| Open Datasets | Yes | The environments are based on Open AI Gym [Brockman et al., 2016] and the Minitaur environment [Coumans and Bai, 2016 2021] where the target environment has been modified by various mechanisms. |
| Dataset Splits | No | The paper describes sampling trajectories from the target environment for fine-tuning but does not specify explicit training/validation/test dataset splits with percentages or counts for reproducing the data partitioning. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions software like 'Open AI Gym' and 'Minitaur environment' and refers to algorithms like 'A2C' and 'PPO', but it does not provide specific version numbers for any ancillary software dependencies (e.g., Python, PyTorch, or specific library versions). |
| Experiment Setup | Yes | Hyper-parameters Optimization. We optimize the hyperparameters of the evaluated algorithms through a grid search for each different environment. Concerning DARC and ANE, we perform a grid search over their main hyper-parameter σDARC 2 {0.0, 0.1, 0.5, 1} and σANE 2 {0.1, 0.2, 0.3, 0.5}. ...For CQL, we perform a grid search over the regularization strength β 2 {5, 10} ... For our proposed algorithm FOOD, the regularization strength hyper-parameter is selected over a grid search depending on the underlying RL agent, 2 {0, 1, 5, 10} for A2C and 2 {0.5, 1, 2, 5} for PPO. ...FOOD, DARC, and ANE are trained for 5000 epochs in the source environment... CQL is trained for 100000 gradient updates for Gravity Pendulum and 500000 gradient updates for all other environments. |