Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Imitation Learning from a Single Temporally Misaligned Video

Authors: William Huey, Huaxiaoyue Wang, Anne Wu, Yoav Artzi, Sanjiban Choudhury

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments showing that the ORCA reward can effectively and efficiently train RL agents to achieve 4.5x improvement (0.11 0.50 average normalized return) for Meta-world tasks and 6.6x improvement (6.55 43.3 average return) for Humanoid-v4 tasks compared to the best frame-level matching approach.
Researcher Affiliation	Academia	1Cornell University. Correspondence to: William Huey <EMAIL>, Huaxiaoyue (Yuki) Wang <EMAIL>.
Pseudocode	Yes	Algorithm 1 ORCA Rewards.
Open Source Code	No	The project website is at https:// portal-cornell.github.io/orca/
Open Datasets	Yes	Meta-World (Yu et al., 2020). Following Fu et al. (2024c), we use ten tasks from the Meta-world environment to evaluate the effectiveness of ORCA reward in the robotic manipulation domain. Humanoid. We define four tasks in the Mu Jo Co Humanoid-v4 environment (Todorov et al., 2012) to examine how well ORCA works with precise motion.
Dataset Splits	Yes	For Meta-world, we follow the RL setup in Fu et al. (2024c). We train Dr Q-v2 (Yarats et al., 2021) with state-based input for 1M steps and evaluate the policy every 10k steps on 10 randomly seeded environments. For the Humanoid environment, we train SAC (Haarnoja et al., 2018) for 2M steps and evaluate the policy every 20k steps on 8 environments.
Hardware Specification	No	The paper does not explicitly describe any specific hardware components (e.g., GPU, CPU models, memory details) used for running the experiments. It focuses on the software environments and models.
Software Dependencies	No	The paper mentions several software components and environments such as "Dr Q-v2 (Yarats et al., 2021)", "SAC (Haarnoja et al., 2018)", "Meta-world (Yu et al., 2020)", "Mu Jo Co Humanoid-v4 environment (Todorov et al., 2012)", "Res Net50 (He et al., 2016)", "Image Net-1K (Deng et al., 2009)", "LIV (Ma et al., 2023)", and "DINOv2 (Oquab et al., 2023)". However, it does not provide specific version numbers for any of these software dependencies, programming languages, or libraries.
Experiment Setup	Yes	Table 3. Training hyperparameters used for experiments on both environments. Parameter Meta-world (Dr Q-v2) Humanoid (SAC) Total environment steps 1,000,000 2,000,000 Learning rate 1e-4 1e-3 Batch size 512 256 Gamma (γ) 0.9 0.99 Learning starts 500 6000 Soft update coefficient 5e-3 5e-3 Actor/Critic architecture (256, 256) (256, 256) Episode length 125 or 175 120