Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

Authors: Yu Chen, Xiangcheng Zhang, Siwei Wang, Longbo Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we provide the details of our numerical experiments. We construct a zero-mean MDP where the expected return for all the state-action pairs are 0, thus risk-neutral algorithms such as LSVI-UCB of Jin et al. (2020) will learn nothing. We also compare our results with the optimistic MDP algorithm of Bastani et al. (2022).
Researcher Affiliation Collaboration Yu Chen * 1 Xiangcheng Zhang * 1 Siwei Wang 2 Longbo Huang 1 [...] 1Tsinghua University, Beijing, China 2Microsoft Research Asia.
Pseudocode Yes Algorithm 1 RS-Dis RL-M, Algorithm 2 RS-Dis RL-V, Algorithm 4 M-Est-LSR(Θ, Hk 1, βLSR), Algorithm 5 M-Est-MLE(Θ, Hk 1, β), Algorithm 6 RS-Dis RL-Low-Rank-MDP(Θ, β), Algorithm 8 V-Est-LSR(Hk 1, F, π, γLSR), Algorithm 9 V-Est-MLE(Hk, Z, π, γMLE), Algorithm 10 RSRL-Linear-CVa R
Open Source Code No The paper does not contain any explicit statement about releasing source code, nor does it provide any links to a code repository.
Open Datasets No The paper states, 'We construct a zero-mean MDP...' for its numerical experiments but does not provide access information (link, DOI, or citation) for this or any other publicly available dataset.
Dataset Splits No The paper describes constructing an MDP for numerical experiments but does not specify any train/validation/test dataset splits or their percentages/counts.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the numerical experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CPLEX 12.4') required to replicate the experiments.
Experiment Setup Yes For simplicity we constructed a toy MDP with S = 3, A = 2, d = 2, H = 6, M = 3.