reproducibilityindex.ai

Constrained episodic reinforcement learning in concave-convex and knapsack settings

Authors: Kianté Brantley, Miro Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments demonstrate that the proposed algorithm signiﬁcantly outperforms these approaches in constrained episodic benchmarks.
Researcher Affiliation	Collaboration	Kianté Brantley University of Maryland kdbrant@cs.umd.edu; Miroslav Dudík Microsoft Research mdudik@microsoft.com; Thodoris Lykouris Microsoft Research thlykour@microsoft.com; Sobhan Miryooseﬁ Princeton University miryoosefi@cs.princeton.edu; Max Simchowitz UC Berkeley msimchow@berkeley.edu; Aleksandrs Slivkins Microsoft Research slivkins@microsoft.com; Wen Sun Cornell University ws455@cornell.edu
Pseudocode	No	The paper describes algorithms and their components (e.g., CONRL, CONPLANNER) and how to solve optimization problems as linear programs, but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code is available at https://github.com/miryoosefi/Con RL
Open Datasets	Yes	We run our experiments on two grid-world environments Mars rover (Tessler et al., 2019) and Box (Leike et al., 2017).
Dataset Splits	No	The paper describes running experiments on grid-world environments and training over a number of trajectories, but it does not specify traditional dataset splits (e.g., training, validation, test percentages or counts) as commonly seen in supervised learning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments.
Software Dependencies	No	The paper does not specify software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation.
Experiment Setup	Yes	The episode horizon H is 30 and the agent s action is perturbed with probability 0.1 to a random action. APPROPO focuses on the feasibility problem, so it requires to specify a lower bound on the reward, which we set to 0.3 for Mars rover and 0.1 for Box.