Time-Constrained Robust MDPs
Authors: Adil Zouitine, David Bertoin, Pierre Clavier, Matthieu Geist, Emmanuel Rachelson
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose three distinct algorithms, each using varying levels of environmental information, and evaluate them extensively on continuous control benchmarks. Our results demonstrate that these algorithms yield an efficient tradeoff between performance and robustness, outperforming traditional deep robust RL methods in time-constrained environments while preserving robustness in classical benchmarks. |
| Researcher Affiliation | Collaboration | Adil Zouitine 1,2, David Bertoin 1,3,6, Pierre Clavier 4,5 Matthieu Geist7, Emmanuel Rachelson2,6 1IRT Saint-Exupéry, 2ISAE-SUPAERO, Université de Toulouse,3IMT, INSA Toulouse 4École Polytechnique, CMAP, 5Inria Paris, He KA 6ANITI, 7Cohere {adil.zouitine, david.bertoin}@irt-saintexupery.com, pierre.clavier@polytechnique.edu |
| Pseudocode | Yes | Algorithm 1 Time-constrained robust training |
| Open Source Code | No | The paper states in the NeurIPS checklist that code is provided for reproduction, but the main body or appendices do not contain an explicit statement of code release for the authors' specific methodology or a direct link to a repository. |
| Open Datasets | Yes | Experimental validation was conducted in continuous control scenarios using the Mu Jo Co simulation environments [5]. |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits. As a reinforcement learning paper, it conducts evaluation but does not use traditional data splits common in supervised learning. |
| Hardware Specification | Yes | All experiments were run on a desktop machine (Intel i9, 10th generation processor, 64GB RAM) with a single NVIDIA RTX 4090 GPU. |
| Software Dependencies | No | The paper mentions using the official M2TD3 [18] implementation and the TD3 implementation from the Clean RL library [32], but it does not provide specific version numbers for these software components or other dependencies like Python or PyTorch. |
| Experiment Setup | Yes | Table 5: Hyperparameters for the M2TD3 Agent and Table 6: Hyperparameters for the TD3 Agent provide specific hyperparameter values such as Batch Size, Learning Rate, and Gamma. |