Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Articulate-Anything: Automatic Modeling of Articulated Objects via a Vision-Language Foundation Model

Authors: Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, ERIC EATON

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In extensive quantitative experiments on the standard Part Net Mobility dataset, ARTICULATE-ANYTHING substantially outperforms prior work, increasing the success rate from 8.7 12.2% to 75% and setting a new bar for state-of-the-art performance.
Researcher Affiliation	Academia	Long Le, Jason Xie, William Liang, Hung-Ju Wang, Yue Yang, Yecheng Jason Ma, Kyle Vedder, Arjun Krishna, Dinesh Jayaraman, Eric Eaton University of Pennsylvania
Pseudocode	Yes	/ Code 1: Joint failure attribution. /
Open Source Code	Yes	Full video demonstrations and source code are available on the website.
Open Datasets	Yes	Datasets: We use the Partnet-Mobility dataset (Mo et al., 2018) which includes human annotations for 2.3K objects, 1.9K revolute joints, and 7.6K prismatic joints.
Dataset Splits	Yes	We evaluate the performance of these five (in-distribution) and the remaining 41 (out-of-distribution) classes.
Hardware Specification	No	No specific hardware (GPU, CPU models, or detailed computer specifications) used for running the experiments were provided.
Software Dependencies	No	The paper mentions several software components like Google's Gemini Flash-1.5, Py Bullet, Sapien, Co Tracker, and Stable-Baselines3, but does not provide specific version numbers for these software libraries or frameworks, except for the VLM model name.
Experiment Setup	Yes	We use few-shot prompting with around 20 in-context examples. The position threshold is set to 50mm and the angular threshold to 0.25 radian ( 14.3 degree). This process terminates when the rating exceeds a threshold of 5. We train a Franka arm to perform four robotic manipulation tasks in the Robosuite simulator using PPO and our generated assets. The policy outputs joint and gripper positions. We train policies over 3 random seeds per task for 2 million environment steps using PPO in Stable-Baselines3 library Raffin et al. (2021). We randomize physics (friction, damping, frictionloss ect), objects scales and poses to obtain robust policies.