Time-Constrained Robust MDPs

Authors: Adil Zouitine, David Bertoin, Pierre Clavier, Matthieu Geist, Emmanuel Rachelson

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We propose three distinct algorithms, each using varying levels of environmental information, and evaluate them extensively on continuous control benchmarks. Our results demonstrate that these algorithms yield an efficient tradeoff between performance and robustness, outperforming traditional deep robust RL methods in time-constrained environments while preserving robustness in classical benchmarks.
Researcher Affiliation Collaboration Adil Zouitine 1,2, David Bertoin 1,3,6, Pierre Clavier 4,5 Matthieu Geist7, Emmanuel Rachelson2,6 1IRT Saint-Exupéry, 2ISAE-SUPAERO, Université de Toulouse,3IMT, INSA Toulouse 4École Polytechnique, CMAP, 5Inria Paris, He KA 6ANITI, 7Cohere {adil.zouitine, david.bertoin}@irt-saintexupery.com, pierre.clavier@polytechnique.edu
Pseudocode Yes Algorithm 1 Time-constrained robust training
Open Source Code No The paper states in the NeurIPS checklist that code is provided for reproduction, but the main body or appendices do not contain an explicit statement of code release for the authors' specific methodology or a direct link to a repository.
Open Datasets Yes Experimental validation was conducted in continuous control scenarios using the Mu Jo Co simulation environments [5].
Dataset Splits No The paper does not provide specific training/validation/test dataset splits. As a reinforcement learning paper, it conducts evaluation but does not use traditional data splits common in supervised learning.
Hardware Specification Yes All experiments were run on a desktop machine (Intel i9, 10th generation processor, 64GB RAM) with a single NVIDIA RTX 4090 GPU.
Software Dependencies No The paper mentions using the official M2TD3 [18] implementation and the TD3 implementation from the Clean RL library [32], but it does not provide specific version numbers for these software components or other dependencies like Python or PyTorch.
Experiment Setup Yes Table 5: Hyperparameters for the M2TD3 Agent and Table 6: Hyperparameters for the TD3 Agent provide specific hyperparameter values such as Batch Size, Learning Rate, and Gamma.