GeneOH Diffusion: Towards Generalizable Hand-Object Interaction Denoising via Denoising Diffusion
Authors: Xueyi Liu, Li Yi
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on four benchmarks with significant domain variations demonstrate the superior effectiveness of our method. |
| Researcher Affiliation | Collaboration | Xueyi Liu1,3 Li Yi1,2,3 1Tsinghua University 2Shanghai AI Laboratory 3Shanghai Qi Zhi Institute |
| Pseudocode | Yes | Algorithm 1 Denoising via Diffusion |
| Open Source Code | Yes | We will release our code to support future research. Project website: meowuu7.github.io/Gene OH-Diffusion |
| Open Datasets | Yes | All models are trained on the GRAB dataset (Taheri et al., 2020). ... We evaluate our model and baselines on four distinct test sets, namely GRAB test set with Gaussian noise, GRAB (Beta) test set with noise sampled from a Beta distribution (B(8, 2)), HOI4D dataset (Liu et al., 2022) with real noise patterns... and ARCTIC dataset (Fan et al., 2023). |
| Dataset Splits | Yes | We follow the cross-object splitting strategy used in TOCH (Zhou et al., 2022) and train models on the training set. ... The training split, containing 1308 manipulation sequences, is used to construct the training dataset. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, or cloud computing instance types, used for running the experiments. It mentions inference time in Appendix C.6 but no hardware specifics. |
| Software Dependencies | No | The paper mentions adapting the implementation of Human Motion Diffusion (Tevet et al., 2022), using Open3D (Zhou et al., 2018), and the scikit-learn package for PCA, but it does not specify exact version numbers for these software components. |
| Experiment Setup | Yes | The denoising model for J. The denoising model for the canonicalized hand trajectory J is trained on canonicalized hand trajectories { J } of all interaction sequences in the training set. We apply per-instance normalization operation to those points at each frame for centralization and scaling purposes. ... The denoising model for S. Similarly, the denoising model for hand-object spatial relations S is trained using representations {S} from all interaction sequences in the training set. ... The denoising model for T. When training the denoising model for the hand-object temporal relations T , we first train an autoencoder... The diffusion steps is set to 400 for Motion Diff, 200 for Spatial Diff, and 100 for Temporal Diff empirically. |