Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction
Authors: Yunji Kim, Seonghyeon Nam, In Cho, Seon Joo Kim
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our method is successfully applied to various datasets without the cost of labeling keypoints in videos. The detected keypoints are similar to human-annotated labels, and prediction results are more realistic compared to the previous methods. Experimental results show that our method produces better results than previous works, even the ones that utilizes human-annotated keypoints labels. |
| Researcher Affiliation | Collaboration | Yunji Kim1, Seonghyeon Nam1, In Cho1, and Seon Joo Kim1,2 1Yonsei University 2Facebook {kim_yunji,shnnam,join,seonjookim}@yonsei.ac.kr |
| Pseudocode | No | The paper describes the algorithms and training processes in text and diagrams but does not include formal pseudocode blocks or algorithm listings. |
| Open Source Code | No | The paper mentions using 'codes released by the authors' for baseline models, but it does not state that its own code for the proposed method is open-source or available. |
| Open Datasets | Yes | Penn Action This dataset [25] consists of videos of human in sports action. The Uv A-NEMO [38] consists of 1234 videos of smiling human faces... The MGIF [31] is a dataset consisting of videos of cartoon animal characters... |
| Dataset Splits | Yes | Due to the lack of data, only 10 samples per each class were used as the test set and the rest as the training set... The final dataset consists of 1172 training videos and 90 test videos. The Uv A-NEMO [38] consists of 1234 videos of smiling human faces, which is split into 1110 videos for the training set and 124 for the evaluation set. For this dataset, 900 videos are used for the training and 100 videos are used for the evaluation. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | We implemented our method using Tensor Flow with the Adam optimizer [41]. While TensorFlow and Adam optimizer are mentioned, no specific version numbers for TensorFlow or any other software dependencies are provided. |
| Experiment Setup | Yes | The resolution of both the input and the output images is 128 128, and the number of keypoints K was set to 40, 15, and 60 for the Penn Action, Uv A-NEMO, and MGIF dataset, respectively. We implemented our method using Tensor Flow with the Adam optimizer [41], the learning rate of 0.0001, the batch size of 32, and the two momentum values of 0.5 and 0.999. We decreased the learning rate by 0.95 for every 20,000 iterations. Considering the tendency of the convergence, λ1, λ2, and λ3 were set to 1, 1000, and 2, respectively. |