Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation
Authors: Yu Chen, Xiangcheng Zhang, Siwei Wang, Longbo Huang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we provide the details of our numerical experiments. We construct a zero-mean MDP where the expected return for all the state-action pairs are 0, thus risk-neutral algorithms such as LSVI-UCB of Jin et al. (2020) will learn nothing. We also compare our results with the optimistic MDP algorithm of Bastani et al. (2022). |
| Researcher Affiliation | Collaboration | Yu Chen * 1 Xiangcheng Zhang * 1 Siwei Wang 2 Longbo Huang 1 [...] 1Tsinghua University, Beijing, China 2Microsoft Research Asia. |
| Pseudocode | Yes | Algorithm 1 RS-Dis RL-M, Algorithm 2 RS-Dis RL-V, Algorithm 4 M-Est-LSR(Θ, Hk 1, βLSR), Algorithm 5 M-Est-MLE(Θ, Hk 1, β), Algorithm 6 RS-Dis RL-Low-Rank-MDP(Θ, β), Algorithm 8 V-Est-LSR(Hk 1, F, π, γLSR), Algorithm 9 V-Est-MLE(Hk, Z, π, γMLE), Algorithm 10 RSRL-Linear-CVa R |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code, nor does it provide any links to a code repository. |
| Open Datasets | No | The paper states, 'We construct a zero-mean MDP...' for its numerical experiments but does not provide access information (link, DOI, or citation) for this or any other publicly available dataset. |
| Dataset Splits | No | The paper describes constructing an MDP for numerical experiments but does not specify any train/validation/test dataset splits or their percentages/counts. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the numerical experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CPLEX 12.4') required to replicate the experiments. |
| Experiment Setup | Yes | For simplicity we constructed a toy MDP with S = 3, A = 2, d = 2, H = 6, M = 3. |