Adversarially Robust Decision Transformer
Authors: Xiaohang Tang, Afonso Marques, Parameswaran Kamalaruban, Ilija Bogunovic
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conduct experiments to examine the robustness of our algorithm, Adversarially Robust Decision Transformer (ARDT), in three settings: (i) Short-horizon sequential games, where the offline dataset has full coverage and the test-time adversary is optimal (Section 4.1), (ii) A long-horizon sequential game, Connect Four, where the offline dataset has only partial coverage and the distributional-shifted test-time adversary (Section 4.2), and (iii) The standard continuous Mujoco tasks in the adversarial setting and a population of test-time adversaries (Section 4.3). |
| Researcher Affiliation | Collaboration | Xiaohang Tang University College London xiaohang.tang.20@ucl.ac.uk Afonso Marques University College London afonso.marques.22@ucl.ac.uk Parameswaran Kamalaruban Featurespace kamal.parameswaran@featurespace.co.uk Ilija Bogunovic University College London i.bogunovic@ucl.ac.uk |
| Pseudocode | Yes | Algorithm 1 Adversarially Robust Decision Transformer (ARDT) |
| Open Source Code | Yes | We publish our datasets along with the codebase via https://github.com/xiaohangt/ardt. There is no data access restrictions. |
| Open Datasets | Yes | We publish our datasets along with the codebase via https://github.com/xiaohangt/ardt. There is no data access restrictions. The Mujoco data profiles are in Table 5 and 4. The Mu Jo Co data has 1000 number of trajectories, each with 1000 steps of interactions. The Connect Four datasets have also 106 number of steps of interaction in total, where each trajectory has a length at most 22. |
| Dataset Splits | No | The paper mentions "Number of training steps" and "Number of testing iterations" in Table 3, but does not explicitly describe a separate validation split or the percentage/number of samples used for validation. |
| Hardware Specification | Yes | We conduct experiments on GPUs: a Ge Force RTX 2080 Ti with memory 11GB, and a NVIDIA A100 with memory 80GB. |
| Software Dependencies | No | The paper states that its implementation is based on others and mentions environments like Mu Jo Co, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or TensorFlow libraries. |
| Experiment Setup | Yes | Process Hyperparameters Values (Full coverage game/Connect Four/Mu Jo Co) ... Learning rate 0.0001 Weight decay 0.0001 Warm up steps 1000 Drop out 0.1 Batch size 128/128/512 Optimizer Adam W ... Expectile level 0.01 |