Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Continuous-Time Reward Machines
Authors: Amin Falah, Shibashis Guha, Ashutosh Trivedi
IJCAI 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we evaluate the performance of our proposed approaches across benchmark environments to assess their efficiency and effectiveness. ... Figure 2 presents the performance of each approach across four benchmarks. |
| Researcher Affiliation | Academia | Amin Falah1 , Shibashis Guha2 , and Ashutosh Trivedi1 1University of Colorado Boulder 2Tata Institute of Fundamental Research EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes algorithms conceptually and mathematically, such as the Q-value update equations (1) and (2) and the counterfactual experience generation process. However, it does not include a clearly labeled "Pseudocode" or "Algorithm" block with structured steps. |
| Open Source Code | Yes | Our implementation can be accessed at: https://github.com/falahamin1/Continuous-Time-Reward-Machines.git |
| Open Datasets | No | The paper describes custom 'benchmark environments' such as an 'autonomous vehicle in an urban environment' and a 'treasure hunt', which are inspired by existing works but no concrete access information (specific links, DOIs, or formal citations with authors/year) is provided for the datasets or environment definitions used in their experiments. For example, for the autonomous vehicle, it states 'inspired by [Oumaima et al., 2020]' but doesn't provide a link to the environment or data itself. |
| Dataset Splits | No | The paper describes a reinforcement learning setup where an agent interacts with an environment over 'episodes' and 'steps', rather than using pre-defined train/test/validation splits from a static dataset. It does not provide specific dataset split information. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper does not list any specific software dependencies or library versions (e.g., Python, PyTorch, TensorFlow, or other solvers with version numbers) used for the implementation. |
| Experiment Setup | Yes | For all experiments, we set the discount parameter α = 0.001 and learning rate θ = 0.1. The RL algorithms follow an ϵ-greedy exploration strategy, with ϵ initially set to 0.7 and decaying exponentially at a rate of 0.01 to transition from exploration to exploitation. |