Switching the Loss Reduces the Cost in Batch Reinforcement Learning
Authors: Alex Ayoub, Kaiwen Wang, Vincent Liu, Samuel Robertson, James Mcinerney, Dawen Liang, Nathan Kallus, Csaba Szepesvari
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Moreover, we empirically verify that FQI-LOG uses fewer samples than FQI trained with squared loss on problems where the optimal policy reliably achieves the goal. |
| Researcher Affiliation | Collaboration | 1University of Alberta 2Cornell University 3Netflix, Inc.. |
| Pseudocode | Yes | Algorithm 1 FQI-LOG |
| Open Source Code | No | The paper does not provide any statement about releasing its own code or a link to a repository for the FQI-LOG implementation or the DRL variant. |
| Open Datasets | Yes | For these experiments, we used two standard control tasks; mountain car and inverted pendulum. |
| Dataset Splits | No | The paper describes how training data is collected and used (e.g., "We train both FQI-LOG and FQI-SQ on the same batch datasets with the first n = [1, 3, 6, 9, 12, 15, 18, 21, 24, 27, 30] 10^3 trajectories"), but it does not specify explicit train/validation/test splits in terms of percentages or sample counts for model training. |
| Hardware Specification | No | The paper does not specify any particular GPU, CPU, or cloud hardware used for the experiments. It only refers to general computing descriptions without specific model numbers. |
| Software Dependencies | No | The paper mentions software components like "sigmoid function" and "BFGS method", and implicitly "DQN" which is typically implemented with deep learning frameworks. However, it does not specify version numbers for these software dependencies (e.g., PyTorch 1.x, TensorFlow 2.x). |
| Experiment Setup | Yes | We first evaluate FQI-LOG and FQI-SQ on an episodic sparse cost variant of mountain car with episodes lasting for 800 steps. |