reproducibilityindex.ai

Provable Risk-Sensitive Distributional Reinforcement Learning with General Function Approximation

Authors: Yu Chen, Xiangcheng Zhang, Siwei Wang, Longbo Huang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section we provide the details of our numerical experiments. We construct a zero-mean MDP where the expected return for all the state-action pairs are 0, thus risk-neutral algorithms such as LSVI-UCB of Jin et al. (2020) will learn nothing. We also compare our results with the optimistic MDP algorithm of Bastani et al. (2022).
Researcher Affiliation	Collaboration	Yu Chen * 1 Xiangcheng Zhang * 1 Siwei Wang 2 Longbo Huang 1 [...] 1Tsinghua University, Beijing, China 2Microsoft Research Asia.
Pseudocode	Yes	Algorithm 1 RS-Dis RL-M, Algorithm 2 RS-Dis RL-V, Algorithm 4 M-Est-LSR(Θ, Hk 1, βLSR), Algorithm 5 M-Est-MLE(Θ, Hk 1, β), Algorithm 6 RS-Dis RL-Low-Rank-MDP(Θ, β), Algorithm 8 V-Est-LSR(Hk 1, F, π, γLSR), Algorithm 9 V-Est-MLE(Hk, Z, π, γMLE), Algorithm 10 RSRL-Linear-CVa R
Open Source Code	No	The paper does not contain any explicit statement about releasing source code, nor does it provide any links to a code repository.
Open Datasets	No	The paper states, 'We construct a zero-mean MDP...' for its numerical experiments but does not provide access information (link, DOI, or citation) for this or any other publicly available dataset.
Dataset Splits	No	The paper describes constructing an MDP for numerical experiments but does not specify any train/validation/test dataset splits or their percentages/counts.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the numerical experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., 'PyTorch 1.9', 'CPLEX 12.4') required to replicate the experiments.
Experiment Setup	Yes	For simplicity we constructed a toy MDP with S = 3, A = 2, d = 2, H = 6, M = 3.