Unsupervised Learning of View-invariant Action Representations

Authors: Junnan Li, Yongkang Wong, Qi Zhao, Mohan S. Kankanhalli

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our learned representations for action recognition on multiple datasets. Our method outperforms state-of-the-art unsupervised methods across multiple datasets. 4 Experiments
Researcher Affiliation Academia Junnan Li Grad. School for Integrative Sciences and Engineering National University of Singapore Singapore lijunnan@u.nus.edu Yongkang Wong School of Computing National University of Singapore Singapore yongkang.wong@nus.edu.sg Qi Zhao Dept. of Computer Science and Engineering University of Minnesota Minneapolis, USA qzhao@cs.umn.edu Mohan S. Kankanhalli School of Computing National University of Singapore Singapore mohan@comp.nus.edu.sg
Pseudocode No The paper describes the components of the learning framework and the optimization process but does not include any pseudocode or algorithm blocks.
Open Source Code No The paper does not provide a specific link or explicit statement about releasing the source code for their methodology.
Open Datasets Yes We use the NTU RGB+D dataset [49] for unsupervised representation learning.
Dataset Splits Yes For cross-subject evaluation, we follow the same training and testing split as in [49]. For cross-view evaluation, samples of cameras 2 and 3 are used for training while those of camera 1 for testing. Since we need at least two cameras for our unsupervised task, we randomly divide the supervised training set with ratio of 8:1 for unsupervised training and test.
Hardware Specification No The paper describes computational parameters such as mini-batch size and optimizer settings but does not provide specific details on the hardware (e.g., GPU/CPU models, memory) used for running experiments.
Software Dependencies No The paper mentions deep learning architectures and optimizers (e.g., ResNet-18, Bi-directional convolutional LSTM, Adam optimizer) but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Implementation details. For Conv in encoder and depth CNN in cross-view decoder, we employ the Res Net-18 architecture [15] up until the final convolution layer, and add a 1 1 64 convolutional layer to reduce the feature size. ... For Bi LSTM, we use convolutional filters of size 7 7 64 for convolution with input and hidden state. We initialize all weights following the method in [14]. During training, we use a mini-batch of size 8. We train the model using the Adam optimizer [20], with an initial learning rate of 1e 5 and a weight decay of 5e 4. We decrease the learning rate by half every 20000 steps (mini-batches). To avoid distracting the flow prediction task, we activate the view adversarial training after 5000 steps. The weights of the loss terms are set as α = 0.5 and β = 0.05, which is determined via cross-validation.