Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach
Authors: Yudi Zhang, Yali Du, Biwei Huang, Ziyan Wang, Jun Wang, Meng Fang, Mykola Pechenizkiy
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method outperforms state-of-the-art methods and the provided visualization further demonstrates the interpretability of our method. |
| Researcher Affiliation | Academia | 1Eindhoven University of Technology 2King s College London 3University of California San Diego 4University College London 5University of Liverpool |
| Pseudocode | Yes | Algorithm 1 Learning the generative process and policy jointly. |
| Open Source Code | No | The project page is located at https://reedzyd.github.io/Generative Return Decomposition/. The paper provides a link to a project page, not directly to a code repository. Although the project page may link to code, the direct link provided in the paper does not meet the criteria for a specific code repository. |
| Open Datasets | Yes | We evaluate our method on eight widely used classical robot control tasks in the Mu Jo Co environment [55], including Half-Cheetah, Ant, Walker2d, Humanoid, Swimmer, Hopper, Humanoid Standup, and Reacher tasks. |
| Dataset Splits | Yes | Validation is performed after every cycle, and the average metric is computed based on 10 test rollouts. |
| Hardware Specification | Yes | All experiments were conducted on an HPC system equipped with 128 Intel Xeon processors operating at a clock speed of 2.2 GHz and 5 terabytes of memory. |
| Software Dependencies | No | The information is insufficient. The paper mentions the Adam optimizer but does not specify version numbers for any software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or libraries. |
| Experiment Setup | Yes | Table 3: The table of the hyper-parameters used in the experiments for GRD. Table 4: The hyper-parameters. |