reproducibilityindex.ai

Optimizing the CVaR via Sampling

Authors: Aviv Tamar, Yonatan Glassner, Shie Mannor

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluated our approach empirically in an RL domain: learning a risk-sensitive policy for Tetris. To our knowledge, such a domain is beyond the reach of existing CVa R optimization approaches. Moreover, our empirical results show that optimizing the CVa R indeed results in useful risk-sensitive policies, and motivates the use of simulation-based optimization for risk-sensitive decision making.
Researcher Affiliation	Academia	Aviv Tamar, Yonatan Glassner, and Shie Mannor Electrical Engineering Department The Technion Israel Institute of Technology Haifa, Israel 32000
Pseudocode	Yes	Algorithm 1 GCVa R
Open Source Code	No	The paper does not contain an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We examine Tetris as a test case for our algorithms. ... We used the regular 10 × 20 Tetris board with the 7 standard shapes (a.k.a. tetrominos).
Dataset Splits	No	The paper does not provide specific training/validation/test dataset splits, as it operates in a reinforcement learning setting based on simulation rather than static dataset partitioning.
Hardware Specification	No	The paper does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers.
Experiment Setup	Yes	We set α = 0.05 and N = 1000. ... Starting from a ﬁxed policy parameter θ0, which was obtained by running several iterations of standard policy gradient (giving both methods a warm start ), we ran both CVa RSGD and standard policy gradient4 for enough iterations such that both algorithms (approximately) converged. ... The score for clearing 1,2,3 and 4 lines is 1,4,8 and 16 respectively. In addition, we limited the maximum number of steps in the game to 1000. ... We used the softmax policy, with the feature set of Thiery and Scherrer (2009).