Primitive-Based 3D Human-Object Interaction Modelling and Programming

Authors: Siqi Liu, Yong-Lu Li, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To explore an effective embedding of HAOI for the machine, we build a new benchmark on 3D HAOI consisting of primitives together with their images and propose a task requiring machines to recover 3D HAOI using primitives from images. Moreover, we propose a baseline of single-view 3D reconstruction on HAOI. ... In experiments, our method achieves decent recovery performance. ... Quantitative Results. Similar to previous works (Xie, Bhatnagar, and Pons-Moll 2022; Xu et al. 2021), Chamfer distance is measured in centimeters. To compare, we set the human height to 175cm, the same in D3D-HOI. The quantitative results of the mean Chamfer distance (cm) are shown in Tab. 2. Results show that CHORE fails to handle articulate object reconstruction as they need to train a CHORE field first. ... Our method captures more information in images for articulated objects compared to D3D-HOI. ... In Tab. 3, we ablate our proposed method to analyze some essential components.
Researcher Affiliation Academia Siqi Liu, Yong-Lu Li*, Zhou Fang, Xinpeng Liu, Yang You, Cewu Lu* Shanghai Jiao Tong University {magi-yunan, yonglu li, joefang, qq456cvb, lucewu}@sjtu.edu.cn, xinpengliu0907@gmail.com
Pseudocode No The paper describes the methodology in prose (Section 3 and 4) but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks or figures.
Open Source Code Yes Our code and data are available at https://mvig-rhos.com/p3haoi.
Open Datasets Yes To explore an effective embedding of HAOI for the machine, we build a new benchmark on 3D HAOI consisting of primitives together with their images and propose a task requiring machines to recover 3D HAOI using primitives from images. ... Our code and data are available at https://mvig-rhos.com/p3haoi.
Dataset Splits No The paper states, 'For each object category, 70% is used as the training set and 30% is used as the testing set.' It does not explicitly mention a separate validation split or its size.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. It mentions software like DINOv2 and Nvdiffrast, but not the underlying hardware.
Software Dependencies No The paper mentions several software tools and frameworks used, such as 'Nvdiffrast', 'DINOv2', 'Detectron2', 'ROMP', 'Openpose', 'Alphapose', 'EFT', and 'Blender', but it does not provide specific version numbers for any of these dependencies.
Experiment Setup Yes Our training process is composed of 3 steps. (1) the object network is frozen and we train for the human. (2) the human network is frozen and we train the object. 3 both the human network and the object network are fine-tuned, and in this step, we do joint optimization. During training, the three steps are continuous without interruption. We determine at which epoch to proceed to the next step based on whether the corresponding losses in each stage have been stable. We set the weights of HOI loss and interpenetration loss from 0. to 1. not until (3) starts. In Steps One and Two, the weights of HOI loss and interpenetration loss are set to 0. Once starting Step Three, both the two weights are set to 1. to train the network for joint optimization. ... We add CNN layers after the DINOv2 feature extractor to fine-tune the predicted features. ... The parameters of the mesh encoder are frozen during training and testing.