reproducibilityindex.ai

Switching the Loss Reduces the Cost in Batch Reinforcement Learning

Authors: Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James Mcinerney, Dawen Liang, Nathan Kallus, Csaba Szepesvari

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Moreover, we empirically verify that FQI-LOG uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.
Researcher Affiliation	Collaboration	1University of Alberta 2Cornell University 3Netflix, Inc..
Pseudocode	Yes	Algorithm 1 FQI-LOG
Open Source Code	No	The paper does not provide any statement about releasing its own code or a link to a repository for the FQI-LOG implementation or the DRL variant.
Open Datasets	Yes	For these experiments, we used two standard control tasks; mountain car and inverted pendulum.
Dataset Splits	No	The paper describes how training data is collected and used (e.g., "We train both FQI-LOG and FQI-SQ on the same batch datasets with the first n = [1, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30] 10^3 trajectories"), but it does not specify explicit train/validation/test splits in terms of percentages or sample counts for model training.
Hardware Specification	No	The paper does not specify any particular GPU, CPU, or cloud hardware used for the experiments. It only refers to general computing descriptions without specific model numbers.
Software Dependencies	No	The paper mentions software components like "sigmoid function" and "BFGS method", and implicitly "DQN" which is typically implemented with deep learning frameworks. However, it does not specify version numbers for these software dependencies (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup	Yes	We first evaluate FQI-LOG and FQI-SQ on an episodic sparse cost variant of mountain car with episodes lasting for 800 steps.