Scaling Up Robust MDPs using Function Approximation
Authors: Aviv Tamar, Shie Mannor, Huan Xu
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs. We show that the proposed method provably succeeds under certain technical conditions, and demonstrate its effectiveness through simulation of an option pricing problem. |
| Researcher Affiliation | Academia | Aviv Tamar AVIVT@TX.TECHNION.AC.IL Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, IsraelShie Mannor SHIE@EE.TECHNION.AC.IL Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, IsraelHuan Xu MPEXUH@NUS.EDU.SG Mechanical Engineering Department, National University of Singapore, Singapore 117575, Singapore |
| Pseudocode | No | The paper describes algorithms using equations and text, but does not include formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | The Matlab code for these results is provided in the supplementary material. |
| Open Datasets | No | Our price fluctuation model M follows a Bernoulli distribution (Cox et al., 1979), xt+1 = ( fuxt, w.p. p fdxt, w.p. 1 p , where the up and down factors, fu and fd, are constant. Our empirical evaluation proceeds as follows. In each experiment, we generate Ndata trajectories of length T from the true model M. |
| Dataset Splits | No | The paper describes generating Ndata, Nsim, and Ntest trajectories but does not specify distinct training, validation, and test splits with proportions or counts. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for the experiments. |
| Software Dependencies | No | The paper mentions 'Matlab code' but does not specify any version numbers for Matlab or any other software dependencies. |
| Experiment Setup | Yes | The parameters for the experiments are provided in the supplementary material, and were chosen to balance the different factors in the problem. Most importantly, we chose γ = 0.98 and a large uncertainty set such that Assumption 2 is severely violated. |