reproducibilityindex.ai

Parameterized Projected Bellman Operator

Authors: Théo Vincent, Alberto Maria Metelli, Boris Belousov, Jan Peters, Marcello Restelli, Carlo D'Eramo

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we empirically showcase the benefits of PBO w.r.t. the regular Bellman operator on several RL problems.
Researcher Affiliation	Academia	1German Research Center for AI (DFKI), Germany 2Department of Computer Science, TU Darmstadt, Darmstadt, Germany 3Department of Electronics, Computer Science, and Bioengineering, Politecnico di Milano, Milano, Italy 4Hessian.ai, Germany 5Centre for Cognitive Science, TU Darmstadt, Darmstadt, Germany 6Center for Artificial Intelligence and Data Science, University of W urzburg, W urzburg, Germany
Pseudocode	Yes	Algorithm 1: Projected FQI & Projected DQN
Open Source Code	Yes	The code is available at https://github.com/theovincent/PBO
Open Datasets	Yes	We consider an offline setting, where we use Pro FQI on car-on-hill (Ernst, Geurts, and Wehenkel 2005), and an online setting, where we use Pro DQN on bicycle balancing (Randlov and Alstrøm 1998), and lunar lander (Brockman et al. 2016).
Dataset Splits	No	The paper mentions data collection and usage within the algorithm but does not specify train/validation/test dataset splits with percentages or sample counts for the empirical evaluation.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., CPU, GPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions leveraging neural network parameterizations and neural network regression, but it does not specify any software dependencies with version numbers (e.g., specific deep learning frameworks like TensorFlow or PyTorch with their versions, or Python version).
Experiment Setup	No	The paper mentions general training parameters such as the number of Bellman iterations (K) and update strategies (e.g., periodic target network updates, soft updates) but does not provide specific hyperparameters like learning rates, batch sizes, optimizer types, or detailed network architectures for the experiments.