Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Zero-Shot Offline Imitation Learning via Optimal Transport
Authors: Thomas Rupf, Marco Bagatella, Nico Gürtler, Jonas Frey, Georg Martius
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The resulting method can learn from offline, suboptimal data, and is capable of non-myopic, zeroshot imitation, as we demonstrate in complex, continuous benchmarks. ... This section constitutes an extensive empirical evaluation of ZILOT for zero-shot IL. We first describe our experimental settings, and then present qualitative and quantitative result, as well as an ablation study. We consider a selection of 30 tasks defined over 5 environments, as summarized below and described in detail in appendix A and C. |
| Researcher Affiliation | Academia | 1Universit at T ubingen, T ubingen, Germany 2MPI for Intelligent Systems, T ubingen, Germany 3ETH, Z urich, Switzerland. Correspondence to: Thomas Rupf <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 OT cost computation for ZILOT |
| Open Source Code | Yes | The code is available at https://github.com/martius-lab/ zilot. |
| Open Datasets | Yes | We train our version of TD-MPC2 offline with the datasets detailed in table 5 for 600k steps. ... Table 5. Environment description. We detail the datasets used for training. Environment Dataset #Transitions ... fetch push WGCSL (Yang et al., 2022) (expert+random) ... pointmaze medium D4RL (Fu et al., 2021) (expert) |
| Dataset Splits | No | The problem setting assumes access to two datasets: Dβ = (si 0, ai 0, si 1, ai 1, . . . )|Dβ| i=1 consisting of full state-action trajectories from M and DE = (gi 0, gi 1, . . . )|DE| i=1 containing demonstrations of an expert in the form of goal sequences, not necessarily abiding to the dynamic of M. (No explicit train/test/validation splits are mentioned for these datasets; Dβ is used for training the world model and value function, and DE for testing the zero-shot imitation.) |
| Hardware Specification | Yes | ZILOT runs at 2 to 4Hz on an Nvidia RTX 4090 GPU, depending on the size of H and the size of the OT problem. ... Training took about 8 to 9 hours on a single Nvidia A100 GPU. |
| Software Dependencies | No | The paper mentions several algorithms and frameworks like TD-MPC2 (Hansen et al., 2024) and ICEM (Pinneri et al., 2020), and notes the Sinkhorn algorithm implementation is inspired by (Flamary et al., 2021) and (Cuturi et al., 2022), but it does not specify concrete version numbers for any software libraries or dependencies (e.g., Python, PyTorch, specific library versions). |
| Experiment Setup | Yes | For training we adopt all TD-MPC2 hyperparameters directly (see table 7). ... Table 7. TD-MPC2 Hyperparameters. ... Table 8. Hyperparameters used for i CEM (Pinneri et al., 2020). ... We run our Sinkhorn algorithm for r = 500 iterations with a regularization factor of ϵ = 0.02. ... Table 6. Environment details. We detail the goal abstraction ϕ, metric h, threshold ϵ, horizon H, maximum episode length Tmax, and discount factor γ used for each environment. |