Optimizing the CVaR via Sampling

Authors: Aviv Tamar, Yonatan Glassner, Shie Mannor

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluated our approach empirically in an RL domain: learning a risk-sensitive policy for Tetris. To our knowledge, such a domain is beyond the reach of existing CVa R optimization approaches. Moreover, our empirical results show that optimizing the CVa R indeed results in useful risk-sensitive policies, and motivates the use of simulation-based optimization for risk-sensitive decision making.
Researcher Affiliation Academia Aviv Tamar, Yonatan Glassner, and Shie Mannor Electrical Engineering Department The Technion Israel Institute of Technology Haifa, Israel 32000
Pseudocode Yes Algorithm 1 GCVa R
Open Source Code No The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets Yes We examine Tetris as a test case for our algorithms. ... We used the regular 10 × 20 Tetris board with the 7 standard shapes (a.k.a. tetrominos).
Dataset Splits No The paper does not provide specific training/validation/test dataset splits, as it operates in a reinforcement learning setting based on simulation rather than static dataset partitioning.
Hardware Specification No The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers.
Experiment Setup Yes We set α = 0.05 and N = 1000. ... Starting from a fixed policy parameter θ0, which was obtained by running several iterations of standard policy gradient (giving both methods a warm start ), we ran both CVa RSGD and standard policy gradient4 for enough iterations such that both algorithms (approximately) converged. ... The score for clearing 1,2,3 and 4 lines is 1,4,8 and 16 respectively. In addition, we limited the maximum number of steps in the game to 1000. ... We used the softmax policy, with the feature set of Thiery and Scherrer (2009).