Unsupervised Learning of View-invariant Action Representations
Authors: Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the effectiveness of our learned representations for action recognition on multiple datasets. Our method outperforms state-of-the-art unsupervised methods across multiple datasets. 4 Experiments |
| Researcher Affiliation | Academia | Junnan Li Grad. School for Integrative Sciences and Engineering National University of Singapore Singapore lijunnan@u.nus.edu Yongkang Wong School of Computing National University of Singapore Singapore yongkang.wong@nus.edu.sg Qi Zhao Dept. of Computer Science and Engineering University of Minnesota Minneapolis, USA qzhao@cs.umn.edu Mohan S. Kankanhalli School of Computing National University of Singapore Singapore mohan@comp.nus.edu.sg |
| Pseudocode | No | The paper describes the components of the learning framework and the optimization process but does not include any pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about releasing the source code for their methodology. |
| Open Datasets | Yes | We use the NTU RGB+D dataset [49] for unsupervised representation learning. |
| Dataset Splits | Yes | For cross-subject evaluation, we follow the same training and testing split as in [49]. For cross-view evaluation, samples of cameras 2 and 3 are used for training while those of camera 1 for testing. Since we need at least two cameras for our unsupervised task, we randomly divide the supervised training set with ratio of 8:1 for unsupervised training and test. |
| Hardware Specification | No | The paper describes computational parameters such as mini-batch size and optimizer settings but does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for running experiments. |
| Software Dependencies | No | The paper mentions deep learning architectures and optimizers (e.g., ResNet-18, Bi-directional convolutional LSTM, Adam optimizer) but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Implementation details. For Conv in encoder and depth CNN in cross-view decoder, we employ the Res Net-18 architecture [15] up until the final convolution layer, and add a 1 1 64 convolutional layer to reduce the feature size. ... For Bi LSTM, we use convolutional filters of size 7 7 64 for convolution with input and hidden state. We initialize all weights following the method in [14]. During training, we use a mini-batch of size 8. We train the model using the Adam optimizer [20], with an initial learning rate of 1e 5 and a weight decay of 5e 4. We decrease the learning rate by half every 20000 steps (mini-batches). To avoid distracting the flow prediction task, we activate the view adversarial training after 5000 steps. The weights of the loss terms are set as α = 0.5 and β = 0.05, which is determined via cross-validation. |