Adversarially Robust Decision Transformer

Authors: Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we conduct experiments to examine the robustness of our algorithm, Adversarially Robust Decision Transformer (ARDT), in three settings: (i) Short-horizon sequential games, where the offline dataset has full coverage and the test-time adversary is optimal (Section 4.1), (ii) A long-horizon sequential game, Connect Four, where the offline dataset has only partial coverage and the distributional-shifted test-time adversary (Section 4.2), and (iii) The standard continuous Mujoco tasks in the adversarial setting and a population of test-time adversaries (Section 4.3).
Researcher Affiliation Collaboration Xiaohang Tang University College London xiaohang.tang.20@ucl.ac.uk Afonso Marques University College London afonso.marques.22@ucl.ac.uk Parameswaran Kamalaruban Featurespace kamal.parameswaran@featurespace.co.uk Ilija Bogunovic University College London i.bogunovic@ucl.ac.uk
Pseudocode Yes Algorithm 1 Adversarially Robust Decision Transformer (ARDT)
Open Source Code Yes We publish our datasets along with the codebase via https://github.com/xiaohangt/ardt. There is no data access restrictions.
Open Datasets Yes We publish our datasets along with the codebase via https://github.com/xiaohangt/ardt. There is no data access restrictions. The Mujoco data profiles are in Table 5 and 4. The Mu Jo Co data has 1000 number of trajectories, each with 1000 steps of interactions. The Connect Four datasets have also 106 number of steps of interaction in total, where each trajectory has a length at most 22.
Dataset Splits No The paper mentions "Number of training steps" and "Number of testing iterations" in Table 3, but does not explicitly describe a separate validation split or the percentage/number of samples used for validation.
Hardware Specification Yes We conduct experiments on GPUs: a Ge Force RTX 2080 Ti with memory 11GB, and a NVIDIA A100 with memory 80GB.
Software Dependencies No The paper states that its implementation is based on others and mentions environments like Mu Jo Co, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or TensorFlow libraries.
Experiment Setup Yes Process Hyperparameters Values (Full coverage game/Connect Four/Mu Jo Co) ... Learning rate 0.0001 Weight decay 0.0001 Warm up steps 1000 Drop out 0.1 Batch size 128/128/512 Optimizer Adam W ... Expectile level 0.01