Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Constrained episodic reinforcement learning in concave-convex and knapsack settings
Authors: Kianté Brantley, Miro Dudik, Thodoris Lykouris, Sobhan Miryoosefi, Max Simchowitz, Aleksandrs Slivkins, Wen Sun
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the proposed algorithm significantly outperforms these approaches in constrained episodic benchmarks. |
| Researcher Affiliation | Collaboration | Kianté Brantley University of Maryland EMAIL; Miroslav Dudík Microsoft Research EMAIL; Thodoris Lykouris Microsoft Research EMAIL; Sobhan Miryoosefi Princeton University EMAIL; Max Simchowitz UC Berkeley EMAIL; Aleksandrs Slivkins Microsoft Research EMAIL; Wen Sun Cornell University EMAIL |
| Pseudocode | No | The paper describes algorithms and their components (e.g., CONRL, CONPLANNER) and how to solve optimization problems as linear programs, but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/miryoosefi/Con RL |
| Open Datasets | Yes | We run our experiments on two grid-world environments Mars rover (Tessler et al., 2019) and Box (Leike et al., 2017). |
| Dataset Splits | No | The paper describes running experiments on grid-world environments and training over a number of trajectories, but it does not specify traditional dataset splits (e.g., training, validation, test percentages or counts) as commonly seen in supervised learning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. |
| Software Dependencies | No | The paper does not specify software dependencies with version numbers, such as programming languages, libraries, or frameworks used for implementation. |
| Experiment Setup | Yes | The episode horizon H is 30 and the agent s action is perturbed with probability 0.1 to a random action. APPROPO focuses on the feasibility problem, so it requires to specify a lower bound on the reward, which we set to 0.3 for Mars rover and 0.1 for Box. |