Parameterized Projected Bellman Operator
Authors: Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems. |
| Researcher Affiliation | Academia | 1German Research Center for AI (DFKI), Germany 2Department of Computer Science, TU Darmstadt, Darmstadt, Germany 3Department of Electronics, Computer Science, and Bioengineering, Politecnico di Milano, Milano, Italy 4Hessian.ai, Germany 5Centre for Cognitive Science, TU Darmstadt, Darmstadt, Germany 6Center for Artificial Intelligence and Data Science, University of W urzburg, W urzburg, Germany |
| Pseudocode | Yes | Algorithm 1: Projected FQI & *Projected DQN* |
| Open Source Code | Yes | The code is available at https://github.com/theovincent/PBO |
| Open Datasets | Yes | We consider an offline setting, where we use Pro FQI on car-on-hill (Ernst, Geurts, and Wehenkel 2005), and an online setting, where we use Pro DQN on bicycle balancing (Randlov and Alstrøm 1998), and lunar lander (Brockman et al. 2016). |
| Dataset Splits | No | The paper mentions data collection and usage within the algorithm but does not specify train/validation/test dataset splits with percentages or sample counts for the empirical evaluation. |
| Hardware Specification | No | The paper does not provide specific details regarding the hardware (e.g., CPU, GPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper mentions leveraging neural network parameterizations and neural network regression, but it does not specify any software dependencies with version numbers (e.g., specific deep learning frameworks like TensorFlow or PyTorch with their versions, or Python version). |
| Experiment Setup | No | The paper mentions general training parameters such as the number of Bellman iterations (K) and update strategies (e.g., periodic target network updates, soft updates) but does not provide specific hyperparameters like learning rates, batch sizes, optimizer types, or detailed network architectures for the experiments. |