Entropy-driven Unsupervised Keypoint Representation Learning in Videos
Authors: Ali Younes, Simone Schaub-Meyer, Georgia Chalvatzaki
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results show superior performance for our information-driven keypoints that resolve challenges like attendance to static and dynamic objects or objects abruptly entering and leaving the scene. We provide qualitative and quantitative empirical results on four different video-datasets against strong baselines for unsupervised temporal keypoint discovery, unveiling the superior representation power of MINT. |
| Researcher Affiliation | Academia | 1Department of Computer Science, Technische Universit at Darmstadt, Germany 2Hessian.AI 3Center for Mind, Brain and Behavior (CMBB), Uni. Marburg and JLU Giessen. |
| Pseudocode | No | No explicit pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Our code is available under an open-source license at: https://github.com/i ROSA-lab/MINT . |
| Open Datasets | Yes | We use CLEVRER (Yi et al., 2019), a dataset for visual reasoning with complete object annotations, containing videos with static and dynamic objects, with good variability in scenes, as a testbed. |
| Dataset Splits | No | The paper states 'We train all keypoint detectors on a subset of 20 videos from CLEVRER and test them on 100.' for the train/test split, but no explicit validation split details are provided. |
| Hardware Specification | Yes | In our experiments, we used a PC with a GPU NVIDIA Tesla V100-DGXS-32GB. |
| Software Dependencies | No | The paper mentions Py Torch (Paszke et al., 2019) and CUDA but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Table 6 provides the hyperparameters used for CLEVRER (Yi et al., 2019) in our experiments. We use the same values for all other datasets, i.e., also for MIME (Sharma et al., 2018), SIMITATE (Memmesheimer et al., 2019), and MAGICAL (Toyer et al., 2020). The only exceptions are the activation threshold γ, the std for heatmap σGi and the threshold of the heatmap τ, where these values depend on the size of the input image (i.e., γ = 15, σGi = 9.0, τ = 0.1 for MIME, γ = 10, σGi = 9.0, τ = 0.5 for SIMITATE, γ = 10, σGi = 7.0, τ = 0.3 for MAGICAL). Table 6. Hyperparameters Parameter name Value learning rate 0.001 clip value 10.0 weight decay 0.00001 epochs 100 num keypoints K 25 number of stacked frames 3 activation threshold γ 15 entropy region size p|R| 3 std for heatmap σGi 9.0 Threshold for heatmap τ 0.1 Thresholded heatmap scale η 3.5 CE contribution (IT) κ 0.5 movement weight (IT) md 1.0 ME weight λME 100 MCE weight λMCE 100 IT weight λIT 20 active status weight λs 10 overlapping weight λo 30 |