reproducibilityindex.ai

Reinforcement Learning of Risk-Constrained Policies in Markov Decision Processes

Authors: Tomáš Brázdil, Krishnendu Chatterjee, Petr Novotný, Jiří Vahala9794-9801

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We implemented RAlph and evaluated it on two sets of benchmarks. The first one is a modified, perfectly observable version of Hallway (Pineau et al. 2003; Smith and Simmons 2004)... As a second benchmark, we consider a controllable random walk (RW). The results are summarized in Table 1.
Researcher Affiliation	Academia	1Faculty of Informatics, Masaryk University, Brno, Czech Republic {xbrazdil, petr.novotny, xvahala1}@ﬁ.muni.cz 2Institute of Science and Technology Austria, Klosterneuburg, Austria Krishnendu.Chatterjee@ist.ac.at
Pseudocode	Yes	Algorithm 1: Training and evaluation of RAlph. and Algorithm 2: The episode sampling of RAlph.
Open Source Code	Yes	Implementation can be found at https://github.com/snurkabill/ Master Thesis/releases/tag/AAAI_release
Open Datasets	Yes	We implemented RAlph and evaluated it on two sets of benchmarks. The ﬁrst one is a modiﬁed, perfectly observable version of Hallway (Pineau et al. 2003; Smith and Simmons 2004)
Dataset Splits	No	The paper describes training and evaluation phases using episodes, but it does not specify explicit train/validation/test dataset splits with percentages or counts for the datasets used.
Hardware Specification	Yes	The test conﬁguration was: CPU: Intel Xeon E5-2620 v2@2.1GHz (24 cores); 8GB heap size; Debian 8.
Software Dependencies	No	The test conﬁguration was: CPU: Intel Xeon E5-2620 v2@2.1GHz (24 cores); 8GB heap size; Debian 8.
Experiment Setup	Yes	Input: MDP M (with a horizon H), risk bound Δ, no. of training episodes m, batch size n (from Algorithm 1) and C is a suitable exploration constant, a parameter ﬁxed in advance of the computation. and Both algorithms were evaluated over 1000 episodes, with a timeout of 1 hour per evaluation.