Interpretable Reward Redistribution in Reinforcement Learning: A Causal Approach

Authors: Yudi Zhang, Yali Du, Biwei Huang, Ziyan Wang, Jun Wang, Meng Fang, Mykola Pechenizkiy

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our method outperforms state-of-the-art methods and the provided visualization further demonstrates the interpretability of our method.
Researcher Affiliation Academia 1Eindhoven University of Technology 2King s College London 3University of California San Diego 4University College London 5University of Liverpool
Pseudocode Yes Algorithm 1 Learning the generative process and policy jointly.
Open Source Code No The project page is located at https://reedzyd.github.io/Generative Return Decomposition/. The paper provides a link to a project page, not directly to a code repository. Although the project page may link to code, the direct link provided in the paper does not meet the criteria for a specific code repository.
Open Datasets Yes We evaluate our method on eight widely used classical robot control tasks in the Mu Jo Co environment [55], including Half-Cheetah, Ant, Walker2d, Humanoid, Swimmer, Hopper, Humanoid Standup, and Reacher tasks.
Dataset Splits Yes Validation is performed after every cycle, and the average metric is computed based on 10 test rollouts.
Hardware Specification Yes All experiments were conducted on an HPC system equipped with 128 Intel Xeon processors operating at a clock speed of 2.2 GHz and 5 terabytes of memory.
Software Dependencies No The information is insufficient. The paper mentions the Adam optimizer but does not specify version numbers for any software dependencies like programming languages, frameworks (e.g., PyTorch, TensorFlow), or libraries.
Experiment Setup Yes Table 3: The table of the hyper-parameters used in the experiments for GRD. Table 4: The hyper-parameters.