Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Imitation Learning with Temporal Logic Constraints

Authors: Zining Fan, He Zhu

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section empirically evaluates Ti Lo IL by addressing the following questions: (Q1) Does Ti Lo IL improve exploration in LTL-constrained tasks? (Q2) Are the new learning-from-demonstrations strategies in Ti Lo IL necessary to learn policies that align with the LTL constraints? ... Results. In Fig. 3, the x-axis shows the environment steps and the y-axis shows the cumulative returns under eventual discounting (App. H.3). ... Ablation Studies. We assess the individual contributions of (a) segmented imitation and (b) multistage discriminator learning to the overall performance.
Researcher Affiliation Academia Zining Fan Rutgers University EMAIL He Zhu Rutgers University EMAIL
Pseudocode Yes Algorithm 1 Ti Lo IL Main Algorithm
Open Source Code Yes The code for Ti Lo IL is available on https://github.com/RU-Automated-Reasoning-Group/ Ti Lo IL.
Open Datasets Yes Our benchmarks, as visualized in Fig. 3, include tasks drawn from LCER [63] and DRL2 [6], complemented by new environments we developed to highlight exploration challenges in LTL tasks. ... Fetch Fetch environments are based on the widely used Fetch robotic benchmark [17]. ... Half Cheetah [61], a standard environment in deep reinforcement learning.
Dataset Splits Yes Both Ti Lo IL and the baselines GAIL, PWIL, and SQIL are provided with only 5 demonstrations. ... We experiment with Ti Lo IL under different numbers of demonstrations and observed that it can effectively bootstrap learning without requiring many demonstrations. See Appendix G for details. ... This figure illustrates our method with different numbers of demonstrations. The lines represent 5, 10, and full-size demonstrations.
Hardware Specification Yes Each experimental run was conducted using NVIDIA Quadro RTX 6000 GPU.
Software Dependencies No Our codebase primarily utilizes Num Py [26] for numerical computations and Torch [49] for its autograd capabilities. Additionally, we partially automate the synthesis of LDBAs from LTL formulas using Rabinizer [40].
Experiment Setup Yes The architectures of the SAC networks are shown below: Actor Network:4-layer MLP, hidden units(256,256,256) Critic Networks:3-layer MLP, hidden units(256,256) Discriminator Networks(Reward):2-layer MLP,hidden units(32) Table 5: Hyperparameters for Q-learning and Soft Actor Critic. HYPERPARAMETER VALUE γ 0.99 α 0.2 BUFFER SIZE 1 106 BATCH SIZE 64 LEANING STARTS 2000 τ 1 10 4 Q LEARNING RATE 3 10 4 ACTOR LEARNING RATE 3 10 4 CRITIC LEARNING RATE 3 10 4 DISCRIMINATOR LEARNING RATE 3 10 4 DISCRIMINATOR UPDATE FREQUENCY 1 STEP TARGET NETWORK UPDATE FREQUENCY 1 STEP