Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Imitation Learning with Temporal Logic Constraints

Authors: Zining Fan, He Zhu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This section empirically evaluates Ti Lo IL by addressing the following questions: (Q1) Does Ti Lo IL improve exploration in LTL-constrained tasks? (Q2) Are the new learning-from-demonstrations strategies in Ti Lo IL necessary to learn policies that align with the LTL constraints? ... Results. In Fig. 3, the x-axis shows the environment steps and the y-axis shows the cumulative returns under eventual discounting (App. H.3). ... Ablation Studies. We assess the individual contributions of (a) segmented imitation and (b) multistage discriminator learning to the overall performance.
Researcher Affiliation	Academia	Zining Fan Rutgers University EMAIL He Zhu Rutgers University EMAIL
Pseudocode	Yes	Algorithm 1 Ti Lo IL Main Algorithm
Open Source Code	Yes	The code for Ti Lo IL is available on https://github.com/RU-Automated-Reasoning-Group/ Ti Lo IL.
Open Datasets	Yes	Our benchmarks, as visualized in Fig. 3, include tasks drawn from LCER [63] and DRL2 [6], complemented by new environments we developed to highlight exploration challenges in LTL tasks. ... Fetch Fetch environments are based on the widely used Fetch robotic benchmark [17]. ... Half Cheetah [61], a standard environment in deep reinforcement learning.
Dataset Splits	Yes	Both Ti Lo IL and the baselines GAIL, PWIL, and SQIL are provided with only 5 demonstrations. ... We experiment with Ti Lo IL under different numbers of demonstrations and observed that it can effectively bootstrap learning without requiring many demonstrations. See Appendix G for details. ... This figure illustrates our method with different numbers of demonstrations. The lines represent 5, 10, and full-size demonstrations.
Hardware Specification	Yes	Each experimental run was conducted using NVIDIA Quadro RTX 6000 GPU.
Software Dependencies	No	Our codebase primarily utilizes Num Py [26] for numerical computations and Torch [49] for its autograd capabilities. Additionally, we partially automate the synthesis of LDBAs from LTL formulas using Rabinizer [40].
Experiment Setup	Yes	The architectures of the SAC networks are shown below: Actor Network:4-layer MLP, hidden units(256,256,256) Critic Networks:3-layer MLP, hidden units(256,256) Discriminator Networks(Reward):2-layer MLP,hidden units(32) Table 5: Hyperparameters for Q-learning and Soft Actor Critic. HYPERPARAMETER VALUE γ 0.99 α 0.2 BUFFER SIZE 1 106 BATCH SIZE 64 LEANING STARTS 2000 τ 1 10 4 Q LEARNING RATE 3 10 4 ACTOR LEARNING RATE 3 10 4 CRITIC LEARNING RATE 3 10 4 DISCRIMINATOR LEARNING RATE 3 10 4 DISCRIMINATOR UPDATE FREQUENCY 1 STEP TARGET NETWORK UPDATE FREQUENCY 1 STEP