Pose Modulated Avatars from Video
Authors: Chunjin Song, Bastian Wandt, Helge Rhodin
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities. Our code is available at https://github.com/Chunjin Song/PM-Avatars. We conduct thorough evaluation and ablation studies, which delve into the importance of window functions and frequency modulations with state-of-the-art results. Our experiments demonstrate that our network outperforms state-of-the-art methods in terms of preserving details and generalization capabilities. |
| Researcher Affiliation | Academia | Chunjin Song1, Bastian Wandt2 & Helge Rhodin1,3 1Department of Computer Science, University of British Columbia 2Department of Electrical Engineer, Link oping University 3Bielefeld University {chunjins,rhodin}@cs.ubc.ca bastian.wandt@liu.se |
| Pseudocode | No | The paper describes the method using text and mathematical equations, but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available at https://github.com/Chunjin Song/PM-Avatars. |
| Open Datasets | Yes | We evaluate our method on widely recognized benchmarks for body modeling. Following the protocol established by Anim-Ne RF, we perform comparisons on the seven actors of the Human3.6M dataset (Ionescu et al., 2011; 2013; Peng et al., 2021a) ... Like DANBO, we also apply Mono Perf Cap (Xu et al., 2018) as a high-resolution dataset... We train our method and three baselines, including Human Ne RF Weng et al. (2022), Mono Human Yu et al. (2023), and Vid2Avatar Guo et al. (2023), over four ZJU-Mocap sequences (S377, S387, S393 and S394)... |
| Dataset Splits | No | The paper mentions training on 'the first part of a video' and testing on 'remaining frames' for novel pose synthesis, and using a 'subset of cameras' for learning and 'remaining cameras' as the test set for novel view synthesis. It also states 'the data split also stays the same as the aforementioned methods for a fair comparison.' However, it does not explicitly provide the specific percentages or counts for training, validation, and test splits within the paper itself. |
| Hardware Specification | Yes | We train our network on a single NVidia RTX 3090 GPU for about 20 hours. |
| Software Dependencies | No | Our method is implemented using Py Torch (Paszke et al., 2019). While PyTorch is mentioned, a specific version number for PyTorch itself, Python, or other key libraries is not provided. |
| Experiment Setup | Yes | For consistency, we maintain the same hyper-parameter settings across various testing experiments, including the loss function with weight λs, the number of training iterations, and the network capacity and learning rate. All the hyper-parameters are chosen depending on the final accuracy on chosen benchmarks. We utilize the Adam optimizer (Kingma & Ba, 2014) with default parameters β1 = 0.9 and β2 = 0.99. We employ the step decay schedule to adjust the learning rate, where the initial learning rate is set to 5 10 4 and we drop the learning rate to 10% every 500000 iterations. Like former methods (Su et al., 2021; 2022), we set λs = 0.001 and NB = 24 to accurately capture the topology variations and avoid introducing unnecessary training changes. The learnable parameters in GNN, window function and the frequency modulation part are activated by the Sine function while other parameters in the neural field F are activated by Relu (Agarap, 2018). |