Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Imitation Learning from a Single Temporally Misaligned Video
Authors: William Huey, Huaxiaoyue Wang, Anne Wu, Yoav Artzi, Sanjiban Choudhury
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments showing that the ORCA reward can effectively and efficiently train RL agents to achieve 4.5x improvement (0.11 0.50 average normalized return) for Meta-world tasks and 6.6x improvement (6.55 43.3 average return) for Humanoid-v4 tasks compared to the best frame-level matching approach. |
| Researcher Affiliation | Academia | 1Cornell University. Correspondence to: William Huey <EMAIL>, Huaxiaoyue (Yuki) Wang <EMAIL>. |
| Pseudocode | Yes | Algorithm 1 ORCA Rewards. |
| Open Source Code | No | The project website is at https:// portal-cornell.github.io/orca/ |
| Open Datasets | Yes | Meta-World (Yu et al., 2020). Following Fu et al. (2024c), we use ten tasks from the Meta-world environment to evaluate the effectiveness of ORCA reward in the robotic manipulation domain. Humanoid. We define four tasks in the Mu Jo Co Humanoid-v4 environment (Todorov et al., 2012) to examine how well ORCA works with precise motion. |
| Dataset Splits | Yes | For Meta-world, we follow the RL setup in Fu et al. (2024c). We train Dr Q-v2 (Yarats et al., 2021) with state-based input for 1M steps and evaluate the policy every 10k steps on 10 randomly seeded environments. For the Humanoid environment, we train SAC (Haarnoja et al., 2018) for 2M steps and evaluate the policy every 20k steps on 8 environments. |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware components (e.g., GPU, CPU models, memory details) used for running the experiments. It focuses on the software environments and models. |
| Software Dependencies | No | The paper mentions several software components and environments such as "Dr Q-v2 (Yarats et al., 2021)", "SAC (Haarnoja et al., 2018)", "Meta-world (Yu et al., 2020)", "Mu Jo Co Humanoid-v4 environment (Todorov et al., 2012)", "Res Net50 (He et al., 2016)", "Image Net-1K (Deng et al., 2009)", "LIV (Ma et al., 2023)", and "DINOv2 (Oquab et al., 2023)". However, it does not provide specific version numbers for any of these software dependencies, programming languages, or libraries. |
| Experiment Setup | Yes | Table 3. Training hyperparameters used for experiments on both environments. Parameter Meta-world (Dr Q-v2) Humanoid (SAC) Total environment steps 1,000,000 2,000,000 Learning rate 1e-4 1e-3 Batch size 512 256 Gamma (γ) 0.9 0.99 Learning starts 500 6000 Soft update coefficient 5e-3 5e-3 Actor/Critic architecture (256, 256) (256, 256) Episode length 125 or 175 120 |