Pose Modulated Avatars from Video

Authors: Chunjin Song, Bastian Wandt, Helge Rhodin

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities. Our code is available at https://github.com/Chunjin Song/PM-Avatars. We conduct thorough evaluation and ablation studies, which delve into the importance of window functions and frequency modulations with state-of-the-art results. Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities.
Researcher Affiliation Academia Chunjin Song1, Bastian Wandt2 & Helge Rhodin1,3 1Department of Computer Science, University of British Columbia 2Department of Electrical Engineer, Link oping University 3Bielefeld University {chunjins,rhodin}@cs.ubc.ca bastian.wandt@liu.se
Pseudocode No The paper describes the method using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks.
Open Source Code Yes Our code is available at https://github.com/Chunjin Song/PM-Avatars.
Open Datasets Yes We evaluate our method on widely recognized benchmarks for body modeling. Following the protocol established by Anim-Ne RF, we perform comparisons on the seven actors of the Human3.6M dataset (Ionescu et al., 2011; 2013; Peng et al., 2021a) ... Like DANBO, we also apply Mono Perf Cap (Xu et al., 2018) as a high-resolution dataset... We train our method and three baselines, including Human Ne RF Weng et al. (2022), Mono Human Yu et al. (2023), and Vid2Avatar Guo et al. (2023), over four ZJU-Mocap sequences (S377, S387, S393 and S394)...
Dataset Splits No The paper mentions training on 'the first part of a video' and testing on 'remaining frames' for novel pose synthesis, and using a 'subset of cameras' for learning and 'remaining cameras' as the test set for novel view synthesis. It also states 'the data split also stays the same as the aforementioned methods for a fair comparison.' However, it does not explicitly provide the specific percentages or counts for training, validation, and test splits within the paper itself.
Hardware Specification Yes We train our network on a single NVidia RTX 3090 GPU for about 20 hours.
Software Dependencies No Our method is implemented using Py Torch (Paszke et al., 2019). While PyTorch is mentioned, a specific version number for PyTorch itself, Python, or other key libraries is not provided.
Experiment Setup Yes For consistency, we maintain the same hyper-parameter settings across various testing experiments, including the loss function with weight λs, the number of training iterations, and the network capacity and learning rate. All the hyper-parameters are chosen depending on the final accuracy on chosen benchmarks. We utilize the Adam optimizer (Kingma & Ba, 2014) with default parameters β1 = 0.9 and β2 = 0.99. We employ the step decay schedule to adjust the learning rate, where the initial learning rate is set to 5 10 4 and we drop the learning rate to 10% every 500000 iterations. Like former methods (Su et al., 2021; 2022), we set λs = 0.001 and NB = 24 to accurately capture the topology variations and avoid introducing unnecessary training changes. The learnable parameters in GNN, window function and the frequency modulation part are activated by the Sine function while other parameters in the neural field F are activated by Relu (Agarap, 2018).