reproducibilityindex.ai

Batch Policy Learning under Constraints

Authors: Hoang Le, Cameron Voloshin, Yisong Yue

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our algorithm achieves strong empirical results in different domains, including in a challenging problem of simulated car driving subject to multiple constraints such as lane keeping and smooth driving. We validate our algorithm and analysis with two experimental settings.
Researcher Affiliation	Academia	Hoang M. Le 1 Cameron Voloshin 1 Yisong Yue 1 1California Institute of Technology, Pasadena, CA. Correspondence to: Hoang M. Le <hmle@caltech.edu>.
Pseudocode	Yes	Algorithm 1 Meta-algo for Batch Constrained Learning, Algorithm 2 Constrained Batch Policy Learning, Algorithm 3 Fitted Q Evaluation: FQE(π, c)
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	Environment & Data Collection. The environment is an 8x8 grid. The agent has 4 actions N,S,E,W at each state. The main goal is to navigate from a starting position to the goal. Each episode terminates when the agent reaches the goal or falls into a hole. The main cost function is deﬁned as c = 1 if goal is reached, otherwise c = 0 everywhere. We simulate a non-optimal data gathering policy πD by adding random sub-optimal actions to the shortest path policy from any given state to goal. We run πD for 5000 trajectories to collect the behavior dataset D (with constraint cost measurement speciﬁed below).
Dataset Splits	No	The paper describes collecting a dataset D (e.g., 'We run πD for 5000 trajectories to collect the behavior dataset D'), and mentions 'test-time performance', but does not provide specific details on how this dataset is split into training, validation, and test sets (e.g., exact percentages or sample counts).
Hardware Specification	No	The paper does not provide any specific details about the hardware used to run the experiments, such as GPU or CPU models, memory, or cloud instance types.
Software Dependencies	No	The paper mentions specific algorithms and models (e.g., DDQN, FQI, FQE, CNNs) but does not list any specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	No	The paper mentions general settings such as 'maximum horizon of 1000 for each episode' and 'set the threshold for each constraint to 75% of the DDQN benchmark', but does not provide specific hyperparameters like learning rates, batch sizes, optimizer details, or detailed neural network architectures.