Explaining RL Decisions with Trajectories
Authors: Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of the proposed approach in terms of quality of attributions as well as practical scalability in diverse environments that involve both discrete and continuous state and action spaces such as grid-worlds, video games (Atari) and continuous control (Mu Jo Co). We also conduct a human study on a simple navigation task to observe how their understanding of the task compares with data attributed for a trained RL policy. |
| Researcher Affiliation | Collaboration | 1Media and Data Science Research, Adobe 2International Institute of Information Technology Hyderabad 3University of Illinois Urbana-Champaign 4Adobe Research |
| Pseudocode | Yes | Algorithm 1: encode Trajectories, Algorithm 2: cluster Trajectories, Algorithm 3: generate Data Embedding, Algorithm 4: train Exp Policies, Algorithm 5: generate Cluster Attribution, Algorithm 6: Trajectory Attribution in Offline RL |
| Open Source Code | No | The paper does not provide any explicit statement about releasing its own source code or a link to a code repository. |
| Open Datasets | Yes | For Seaquest, we collect offline data of 717 trajectories from the D4RL-Atari repository and use a pre-trained decision transformer as trajectory encoder. Similarly, for Half Cheetah, we collect offline data of 1000 trajectories from the D4RL repository (Fu et al., 2020) and use a pre-trained trajectory transformer as a trajectory encoder. |
| Dataset Splits | No | The paper describes training on collected data, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | Yes | The trainings were performed parallelly on a single Nvidia-A100 GPU hardware. |
| Software Dependencies | No | The paper mentions 'd3rlpy implementations (Takuma Seno, 2021)' but does not provide specific version numbers for this or any other software dependencies like Python, PyTorch, or TensorFlow. |
| Experiment Setup | Yes | We used the critic learning rate of 3 10 4 and the actor learning rate of 3 10 4 with a batch size of 256. The trainings were performed parallelly on a single Nvidia-A100 GPU hardware. We again used the critic learning rate of 3 10 4 and the actor learning rate of 3 10 4 with a batch size of 512. |