Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Interactive Anomaly Detection for Articulated Objects via Motion Anticipation
Authors: Ankan Bhunia, Changjian Li, Hakan Bilen
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper presents a novel problem, interactive anomaly detection (AD) for articulated objects, and introduces a tailored solution that detects functional anomalies by integrating vision, interaction, and anticipation. Unlike traditional AD methods that rely on passive visual observations, our approach actively manipulates objects to reveal anomalies that would otherwise remain hidden. Our method learns to generate a sequence of actions to interact exclusively with normal objects and to anticipate the resulting normal motion. During inference, the model applies predicted actions to the object and compares the observed motion with the anticipated motion to detect anomalies. Additionally, we introduce a new benchmark, Part Net-IAD, for interactive AD, which includes articulated objects with realistic functional anomalies. Experiments show strong generalization to detect anomalies in both seen and unseen object categories. |
| Researcher Affiliation | Academia | Ankan Bhunia Changjian Li Hakan Bilen University of Edinburgh |
| Pseudocode | No | The paper describes the approach in text and uses diagrams (e.g., Figure 2) to illustrate the architecture, but it does not contain explicit pseudocode or algorithm blocks with structured, code-like steps. |
| Open Source Code | No | We will release the dataset and model, the source code that is used to generate the data and to train the models upon publication. |
| Open Datasets | Yes | To support this task, we introduce Part Net-IAD, a new benchmark for interactive AD. Built upon the Part Net-Mobility dataset [39], we inject realistic functional anomalies into articulated objects and simulate interaction environments where a robotic arm can push and pull object parts. ... To evaluate the generalization ability of the learned motion prior, we use real-world articulated objects from the AKB-48 dataset [26]. |
| Dataset Splits | Yes | We focus on two evaluation settings: i) evaluate our model on the test set of the training categories to measure its generalization ability to unseen objects within the same categories, and ii) evaluate on the test set of the testing categories to measure the generalization ability to unseen object categories. For training the normal motion anticipation module, we use a mutually exclusive set of 402 normal objects (from the train set of the training categories of Part Net-Mobility). |
| Hardware Specification | Yes | The offline dataset consists of 145M interaction pairs, generated over 3-4 days on a single 64-core CPU machine by parallelizing the simulation across multiple CPU cores, to kickstart the training. |
| Software Dependencies | No | We create interactive simulated environments by using Pybullet [9]. ... A Point Transformer3 (PTv3) [37] processes the point cloud... ... apply an optical flow method [35] to predict 2D motion flow. The paper refers to specific software but does not include version numbers in the provided text. |
| Experiment Setup | Yes | The camera is placed on the upper hemisphere with a random azimuth [120 , 270 ) and altitude [25 , 40 ] ... The agent then moves the end-effector 0.18 meters along the action direction. ... To complement the offline dataset, online data sampling [28] (see supplementary) is introduced after the 10th epoch, with each batch consisting of 70% offline data and 30% online data ... During inference, we use the current memory mask B+ to sample Np = 1000 points for the starting interaction of each part and Np = 250 points after the first timestep once the part is identified. We sample Nu = 128 action direction to form the total action space. The temperature λT is set to 0.3 in Eq. (8). Tmax is set to 15. |