Switching the Loss Reduces the Cost in Batch Reinforcement Learning

Authors: Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James Mcinerney, Dawen Liang, Nathan Kallus, Csaba Szepesvari

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Moreover, we empirically verify that FQI-LOG uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal.
Researcher Affiliation Collaboration 1University of Alberta 2Cornell University 3Netflix, Inc..
Pseudocode Yes Algorithm 1 FQI-LOG
Open Source Code No The paper does not provide any statement about releasing its own code or a link to a repository for the FQI-LOG implementation or the DRL variant.
Open Datasets Yes For these experiments, we used two standard control tasks; mountain car and inverted pendulum.
Dataset Splits No The paper describes how training data is collected and used (e.g., "We train both FQI-LOG and FQI-SQ on the same batch datasets with the first n = [1, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30] 10^3 trajectories"), but it does not specify explicit train/validation/test splits in terms of percentages or sample counts for model training.
Hardware Specification No The paper does not specify any particular GPU, CPU, or cloud hardware used for the experiments. It only refers to general computing descriptions without specific model numbers.
Software Dependencies No The paper mentions software components like "sigmoid function" and "BFGS method", and implicitly "DQN" which is typically implemented with deep learning frameworks. However, it does not specify version numbers for these software dependencies (e.g., PyTorch 1.x, TensorFlow 2.x).
Experiment Setup Yes We first evaluate FQI-LOG and FQI-SQ on an episodic sparse cost variant of mountain car with episodes lasting for 800 steps.