GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion

Authors: Xueyi Liu, Li Yi

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four benchmarks with significant domain variations demonstrate the superior effectiveness of our method.
Researcher Affiliation Collaboration Xueyi Liu1,3 Li Yi1,2,3 1Tsinghua University 2Shanghai AI Laboratory 3Shanghai Qi Zhi Institute
Pseudocode Yes Algorithm 1 Denoising via Diffusion
Open Source Code Yes We will release our code to support future research. Project website: meowuu7.github.io/Gene OH-Diffusion
Open Datasets Yes All models are trained on the GRAB dataset (Taheri et al., 2020). ... We evaluate our model and baselines on four distinct test sets, namely GRAB test set with Gaussian noise, GRAB (Beta) test set with noise sampled from a Beta distribution (B(8, 2)), HOI4D dataset (Liu et al., 2022) with real noise patterns... and ARCTIC dataset (Fan et al., 2023).
Dataset Splits Yes We follow the cross-object splitting strategy used in TOCH (Zhou et al., 2022) and train models on the training set. ... The training split, containing 1308 manipulation sequences, is used to construct the training dataset.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing instance types, used for running the experiments. It mentions inference time in Appendix C.6 but no hardware specifics.
Software Dependencies No The paper mentions adapting the implementation of Human Motion Diffusion (Tevet et al., 2022), using Open3D (Zhou et al., 2018), and the scikit-learn package for PCA, but it does not specify exact version numbers for these software components.
Experiment Setup Yes The denoising model for J. The denoising model for the canonicalized hand trajectory J is trained on canonicalized hand trajectories { J } of all interaction sequences in the training set. We apply per-instance normalization operation to those points at each frame for centralization and scaling purposes. ... The denoising model for S. Similarly, the denoising model for hand-object spatial relations S is trained using representations {S} from all interaction sequences in the training set. ... The denoising model for T. When training the denoising model for the hand-object temporal relations T , we first train an autoencoder... The diffusion steps is set to 400 for Motion Diff, 200 for Spatial Diff, and 100 for Temporal Diff empirically.