Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Temporal-Logic-Based Reward Shaping for Continuing Reinforcement Learning Tasks
Authors: Yuqian Jiang, Suda Bharadwaj, Bo Wu, Rishi Shah, Ufuk Topcu, Peter Stone7995-8003
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed method on three continuing tasks. In all cases, shaping speeds up the average-reward learning rate without any reduction in the performance of the learned policy compared to relevant baselines. |
| Researcher Affiliation | Collaboration | 1 Department of Computer Science, The University of Texas at Austin 2 Department of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin 3 Amazon 4 Sony AI |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All source code is available as supplementary material. |
| Open Datasets | Yes | We test the proposed framework in three continuing learning tasks: continual area sweeping (Ahmadi and Stone 2005; Shah et al. 2020), control of a cart pole in Open AI gym (Brockman et al. 2016), and motion-planning in a grid world (Mahadevan 1996). |
| Dataset Splits | No | No explicit details on training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) are provided. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running experiments are mentioned in the paper. |
| Software Dependencies | No | The paper mentions 'DQN-based deep average-reward RL approach' and 'Open AI Gym' but does not specify software dependencies with version numbers for reproducibility (e.g., specific library versions for PyTorch, TensorFlow, or the Gym environment). |
| Experiment Setup | Yes | In this scenario, the kitchen has the most cleaning needs, and the given formula is to always stay in the kitchen. ... The potential function Φ is constructed as Equation 7, where C = 1 and d(s, a) is the negative of the minimal distance between s and the kitchen and plus 1 if a gets closer to the kitchen. ...At every time step, there is a 0.2 probability that the current position of the human needs cleaning. There is also a 0.2 probability that a dirty cell becomes clean by itself with every step. The human moves randomly between the corridor and the top left room and has a speed of 1 cell per step. |