Optimizing the CVaR via Sampling
Authors: Aviv Tamar, Yonatan Glassner, Shie Mannor
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluated our approach empirically in an RL domain: learning a risk-sensitive policy for Tetris. To our knowledge, such a domain is beyond the reach of existing CVa R optimization approaches. Moreover, our empirical results show that optimizing the CVa R indeed results in useful risk-sensitive policies, and motivates the use of simulation-based optimization for risk-sensitive decision making. |
| Researcher Affiliation | Academia | Aviv Tamar, Yonatan Glassner, and Shie Mannor Electrical Engineering Department The Technion Israel Institute of Technology Haifa, Israel 32000 |
| Pseudocode | Yes | Algorithm 1 GCVa R |
| Open Source Code | No | The paper does not contain an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We examine Tetris as a test case for our algorithms. ... We used the regular 10 × 20 Tetris board with the 7 standard shapes (a.k.a. tetrominos). |
| Dataset Splits | No | The paper does not provide specific training/validation/test dataset splits, as it operates in a reinforcement learning setting based on simulation rather than static dataset partitioning. |
| Hardware Specification | No | The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers. |
| Experiment Setup | Yes | We set α = 0.05 and N = 1000. ... Starting from a fixed policy parameter θ0, which was obtained by running several iterations of standard policy gradient (giving both methods a warm start ), we ran both CVa RSGD and standard policy gradient4 for enough iterations such that both algorithms (approximately) converged. ... The score for clearing 1,2,3 and 4 lines is 1,4,8 and 16 respectively. In addition, we limited the maximum number of steps in the game to 1000. ... We used the softmax policy, with the feature set of Thiery and Scherrer (2009). |