Parameterized Projected Bellman Operator

Authors: Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems.
Researcher Affiliation Academia 1German Research Center for AI (DFKI), Germany 2Department of Computer Science, TU Darmstadt, Darmstadt, Germany 3Department of Electronics, Computer Science, and Bioengineering, Politecnico di Milano, Milano, Italy 4Hessian.ai, Germany 5Centre for Cognitive Science, TU Darmstadt, Darmstadt, Germany 6Center for Artificial Intelligence and Data Science, University of W urzburg, W urzburg, Germany
Pseudocode Yes Algorithm 1: Projected FQI & *Projected DQN*
Open Source Code Yes The code is available at https://github.com/theovincent/PBO
Open Datasets Yes We consider an offline setting, where we use Pro FQI on car-on-hill (Ernst, Geurts, and Wehenkel 2005), and an online setting, where we use Pro DQN on bicycle balancing (Randlov and Alstrøm 1998), and lunar lander (Brockman et al. 2016).
Dataset Splits No The paper mentions data collection and usage within the algorithm but does not specify train/validation/test dataset splits with percentages or sample counts for the empirical evaluation.
Hardware Specification No The paper does not provide specific details regarding the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies No The paper mentions leveraging neural network parameterizations and neural network regression, but it does not specify any software dependencies with version numbers (e.g., specific deep learning frameworks like TensorFlow or PyTorch with their versions, or Python version).
Experiment Setup No The paper mentions general training parameters such as the number of Bellman iterations (K) and update strategies (e.g., periodic target network updates, soft updates) but does not provide specific hyperparameters like learning rates, batch sizes, optimizer types, or detailed network architectures for the experiments.