reproducibilityindex.ai

Explaining RL Decisions with Trajectories

Authors: Shripad Vilasrao Deshmukh, Arpan Dasgupta, Balaji Krishnamurthy, Nan Jiang, Chirag Agarwal, Georgios Theocharous, Jayakumar Subramanian

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of the proposed approach in terms of quality of attributions as well as practical scalability in diverse environments that involve both discrete and continuous state and action spaces such as grid-worlds, video games (Atari) and continuous control (Mu Jo Co). We also conduct a human study on a simple navigation task to observe how their understanding of the task compares with data attributed for a trained RL policy.
Researcher Affiliation	Collaboration	1Media and Data Science Research, Adobe 2International Institute of Information Technology Hyderabad 3University of Illinois Urbana-Champaign 4Adobe Research
Pseudocode	Yes	Algorithm 1: encode Trajectories, Algorithm 2: cluster Trajectories, Algorithm 3: generate Data Embedding, Algorithm 4: train Exp Policies, Algorithm 5: generate Cluster Attribution, Algorithm 6: Trajectory Attribution in Offline RL
Open Source Code	No	The paper does not provide any explicit statement about releasing its own source code or a link to a code repository.
Open Datasets	Yes	For Seaquest, we collect offline data of 717 trajectories from the D4RL-Atari repository and use a pre-trained decision transformer as trajectory encoder. Similarly, for Half Cheetah, we collect offline data of 1000 trajectories from the D4RL repository (Fu et al., 2020) and use a pre-trained trajectory transformer as a trajectory encoder.
Dataset Splits	No	The paper describes training on collected data, but it does not specify explicit train/validation/test dataset splits with percentages, sample counts, or references to predefined splits.
Hardware Specification	Yes	The trainings were performed parallelly on a single Nvidia-A100 GPU hardware.
Software Dependencies	No	The paper mentions 'd3rlpy implementations (Takuma Seno, 2021)' but does not provide specific version numbers for this or any other software dependencies like Python, PyTorch, or TensorFlow.
Experiment Setup	Yes	We used the critic learning rate of 3 10 4 and the actor learning rate of 3 10 4 with a batch size of 256. The trainings were performed parallelly on a single Nvidia-A100 GPU hardware. We again used the critic learning rate of 3 10 4 and the actor learning rate of 3 10 4 with a batch size of 512.