Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Authors: Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill4436-4443

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We further demonstrate that our algorithm finds CVa Roptimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces. Experimental Evaluation We validate our algorithm empirically in three simulated environments against baseline approaches.
Researcher Affiliation Academia 1Institute of Computational and Mathematical Engineering (ICME), Stanford University, California, USA 2Department of Computer Science, Stanford University, California, USA 3 Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Pseudocode Yes Algorithm 1: CVa R-MDP
Open Source Code No The paper provides a link to an arXiv preprint (https://arxiv.org/abs/1911.01546) for a more detailed manuscript, but this link does not provide access to the source code for the described methodology.
Open Datasets Yes We validate our algorithm empirically in three simulated environments... a machine replacement task (Delage and Mannor 2010), a well validated simulator for type 1 diabetes (Man et al. 2014) and a simulated treatment optimization task for HIV (Ernst et al. 2006).
Dataset Splits No The paper does not provide explicit numerical training/validation/test dataset splits (e.g., 80/10/10 percentages or sample counts) for the environments used. It mentions using 'adult#001' for hyperparameter optimization and then testing on other adults, which implies a patient-wise split but not a general dataset split with percentages.
Hardware Specification No The paper describes the neural network architectures used (e.g., "2 hidden layers of size 32 with Re LU activation"), but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies No The paper mentions various algorithmic components and models (e.g., "Re LU activation", "softmax layer", "real NVP"), but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes Input: Parameters: γ, risk level α (0, 1), c 0, density model ρ,... This consists of 2 hidden layers of size 32 with Re LU activation for Diabetes 1 Treatment, and 4 hidden layers of size 128 with Re LU activation for HIV Treatment, both followed by a softmax layer for each action. The density model is a real NVP (Dinh, Sohl-Dickstein, and Bengio 2016) with 3 hidden layers each of size 64. To provide a fair comparison, we evaluated across a number of schedules for reducing the ϵ parameter, and a small set of parameters (4-7) for the optimism value c for our method. For the Diabetes Treatment domain, hyperparameters are optimized only on adult#001.