Risk-Aware Reinforcement Learning with Coherent Risk Measures and Non-linear Function Approximation
Authors: Thanh Lam, Arun Verma, Bryan Kian Hsiang Low, Patrick Jaillet
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we validate our theoretical results via empirical experiments on synthetic and real-world data. |
| Researcher Affiliation | Academia | Department of Computer Science, National University of Singapore, Republic of Singapore Department of Electrical Engineering and Computer Science, MIT, USA {chithanh, arun, lowkh}@comp.nus.edu.sg jaillet@mit.edu |
| Pseudocode | Yes | RA-UCB Risk-Aware Upper Confidence Bound 1:Input: Hyperparameters of coherent risk measure ρ (e.g., confidence level α (0, 1) for CVa R) 2: for episode t = 1, 2, . . . , T do 3: Receive the initial state xt 1 and initialize V t H+1 as the zero function. 4: for step h = H, . . . , 1 do 5: For τ [t 1], draw m samples from the weak simulator and construct the response vector yt h using Eq. (7). 6: Compute µt h and σt h using Eq. (8). 7: Compute Qt h and V t h using Eq. (9). 8: end for 9: for step h = 1, . . . , H do 10: Take action at h arg max a A Qt h(xt h, a). 11: Observe reward rh(xt h, at h) and the next state xt h+1. 12: end for 13: end for |
| Open Source Code | Yes | The code for these experiments is available in the supplementary material. |
| Open Datasets | No | The paper mentions "synthetic and real-world data" and states that the trading environment is "based on real historical exchange rates and volumes between EUR and USD" and customized from "Forex Env in the python package gym-anytrading.6". It does not provide direct access links, DOIs, or citations to specific public datasets used for training. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not mention any specific hardware specifications (e.g., GPU/CPU models, memory, or cloud resources) used for running the experiments. |
| Software Dependencies | No | The paper mentions using "the RBF kernel and the Kernel Ridge regressor from Scikit-learn" and customizing an environment based on "the python package gym-anytrading" but does not specify version numbers for these software components. |
| Experiment Setup | Yes | We set the horizon of each episode to H = 30. ... In this experiment, we use m = 100 samples from the weak simulator to estimate the risk in Eq. (7). ... The robot does not know perturbation parameters (r = 0.3) and the obstacles positions, so it has to learn them online via interacting with the environment. |