reproducibilityindex.ai

Prioritized Soft Q-Decomposition for Lexicographic Reinforcement Learning

Authors: Finn Rietz, Erik Schaffernicht, Stefan Heinrich, Johannes A. Stork

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the efﬁcacy of our approach by presenting successful learning, reuse, and adaptation results for both lowand high-dimensional simulated robot control tasks, as well as ofﬂine learning results. In contrast to baseline approaches, PSQD does not trade off between conﬂicting subtasks or priority constraints and satisﬁes subtask priorities during learning.
Researcher Affiliation	Academia	Finn Rietz Orebro University Sweden Erik Schaffernicht Orebro University Sweden Stefan Heinrich IT University of Copenhagen Denmark Johannes A. Stork Orebro University Sweden
Pseudocode	Yes	A pictographic overview of our method as well as pseudocode can be found in supplementary material D. Algorithm 1 Subtask pre-training with SQL, Algorithm 2 Incremental PSQD subtask adaptation.
Open Source Code	Yes	A Git Hub repository with the implementation of the algorithm, experiment setup with hyperparameters, and documentation is available here: https://github.com/frietz58/psqd/. The repository provides the complete PSQD implementation and can be used to reproduce the results in this paper.
Open Datasets	No	The paper describes using a custom 2D navigation environment and a simulated Franka Emika Panda joint-control task based on the Gymnasium Robotics package. It does not provide access information (link, citation with author/year) for a publicly available dataset used for training. Gymnasium Robotics is a package, not a dataset.
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits. It discusses pre-training, zero-shot composition, and adaptation, but without specific percentages or sample counts for data partitioning.
Hardware Specification	No	The paper describes simulated environments and control tasks but does not specify any hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using the "Gymnasium Robotics package" but does not provide specific version numbers for any software dependencies, libraries, or programming languages used in the implementation.
Experiment Setup	Yes	We normalize actions to unit length to bound the action space and penalize non-straight actions. The high-priority task r1 corresponds to obstacle avoidance and yields negative rewards in close proximity to the -shaped obstacle (see Fig. 1a) ( σ2 exp( d2 2 l2 ), if d > 0 β σ2 exp( d2 2 l2 ) otherwise, where d is obstacle distance (inferred from s), σ = 1 and l = 1 parameterize a squared exponential kernel, and β = 10 is a an additional punishment for colliding with the obstacle. The auxiliary rewards r2 and r3 respectively yield negative rewards everywhere except in small areas at the top and at the right side of the environment r2(s) = 0 if s.y > 7 δ otherwise, r3(s) = 0 if s.x > 7 δ otherwise, , where we use δ = 5 in all our experiments.