Scaling Up Robust MDPs using Function Approximation

Authors: Aviv Tamar, Shie Mannor, Huan Xu

ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this work we employ a reinforcement learning approach to tackle this planning problem: we develop a robust approximate dynamic programming method based on a projected fixed point equation to approximately solve large scale robust MDPs. We show that the proposed method provably succeeds under certain technical conditions, and demonstrate its effectiveness through simulation of an option pricing problem.
Researcher Affiliation Academia Aviv Tamar AVIVT@TX.TECHNION.AC.IL Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, IsraelShie Mannor SHIE@EE.TECHNION.AC.IL Electrical Engineering Department, The Technion Israel Institute of Technology, Haifa 32000, IsraelHuan Xu MPEXUH@NUS.EDU.SG Mechanical Engineering Department, National University of Singapore, Singapore 117575, Singapore
Pseudocode No The paper describes algorithms using equations and text, but does not include formal pseudocode blocks or algorithms labeled as such.
Open Source Code Yes The Matlab code for these results is provided in the supplementary material.
Open Datasets No Our price fluctuation model M follows a Bernoulli distribution (Cox et al., 1979), xt+1 = ( fuxt, w.p. p fdxt, w.p. 1 p , where the up and down factors, fu and fd, are constant. Our empirical evaluation proceeds as follows. In each experiment, we generate Ndata trajectories of length T from the true model M.
Dataset Splits No The paper describes generating Ndata, Nsim, and Ntest trajectories but does not specify distinct training, validation, and test splits with proportions or counts.
Hardware Specification No The paper does not provide any specific hardware details used for the experiments.
Software Dependencies No The paper mentions 'Matlab code' but does not specify any version numbers for Matlab or any other software dependencies.
Experiment Setup Yes The parameters for the experiments are provided in the supplementary material, and were chosen to balance the different factors in the problem. Most importantly, we chose γ = 0.98 and a large uncertainty set such that Assumption 2 is severely violated.