reproducibilityindex.ai

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

Authors: Ramtin Keramati, Christoph Dann, Alex Tamkin, Emma Brunskill4436-4443

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We further demonstrate that our algorithm ﬁnds CVa Roptimal policies substantially faster than existing baselines in several simulated environments with discrete and continuous state spaces. Experimental Evaluation We validate our algorithm empirically in three simulated environments against baseline approaches.
Researcher Affiliation	Academia	1Institute of Computational and Mathematical Engineering (ICME), Stanford University, California, USA 2Department of Computer Science, Stanford University, California, USA 3 Machine Learning Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
Pseudocode	Yes	Algorithm 1: CVa R-MDP
Open Source Code	No	The paper provides a link to an arXiv preprint (https://arxiv.org/abs/1911.01546) for a more detailed manuscript, but this link does not provide access to the source code for the described methodology.
Open Datasets	Yes	We validate our algorithm empirically in three simulated environments... a machine replacement task (Delage and Mannor 2010), a well validated simulator for type 1 diabetes (Man et al. 2014) and a simulated treatment optimization task for HIV (Ernst et al. 2006).
Dataset Splits	No	The paper does not provide explicit numerical training/validation/test dataset splits (e.g., 80/10/10 percentages or sample counts) for the environments used. It mentions using 'adult#001' for hyperparameter optimization and then testing on other adults, which implies a patient-wise split but not a general dataset split with percentages.
Hardware Specification	No	The paper describes the neural network architectures used (e.g., "2 hidden layers of size 32 with Re LU activation"), but it does not provide any specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments.
Software Dependencies	No	The paper mentions various algorithmic components and models (e.g., "Re LU activation", "softmax layer", "real NVP"), but it does not specify any software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	Input: Parameters: γ, risk level α (0, 1), c 0, density model ρ,... This consists of 2 hidden layers of size 32 with Re LU activation for Diabetes 1 Treatment, and 4 hidden layers of size 128 with Re LU activation for HIV Treatment, both followed by a softmax layer for each action. The density model is a real NVP (Dinh, Sohl-Dickstein, and Bengio 2016) with 3 hidden layers each of size 64. To provide a fair comparison, we evaluated across a number of schedules for reducing the ϵ parameter, and a small set of parameters (4-7) for the optimism value c for our method. For the Diabetes Treatment domain, hyperparameters are optimized only on adult#001.