PACE: Predictive and Contrastive Embedding for Unsupervised Action Segmentation

Authors: Jiahao Wang, Jie Qin, Yunhong Wang, Annan Li

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three challenging benchmarks demonstrate the superiority of our method, with up to 26.9% improvements in F1score over the state of the art.
Researcher Affiliation Academia 1State Key Laboratory of Virtual Reality Technology and System, Beihang University 2College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide explicit access information or links to source code for the described methodology.
Open Datasets Yes We evaluate the performance of PACE on three UAS benchmarks, namely Breakfast [Kuehne et al., 2014], 50Salads [Stein and Mc Kenna, 2013] and You Tube Instructions (YTI) [Alayrac et al., 2016].
Dataset Splits No The paper uses well-known benchmark datasets (Breakfast, 50Salads, YTI) but does not explicitly provide details on training, validation, and test splits (e.g., percentages, sample counts, or specific predefined split names/citations).
Hardware Specification Yes All experiments are conducted with two NVIDIA RTX 3090 GPUs.
Software Dependencies No The paper mentions software like TensorFlow and Scipy but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup Yes The encoder has 3 basic layers in total. There are 4 attention heads in SA, and the inner-layer of FFN is 2,048-d. We set the dimensionalities of both hidden representations (h) and contrastive embeddings (m) to 512. The encoder takes as input video sequences of 100 frames (i.e. n = 100) and the sequence is then divided into clips with length s = 5. We set the training batch size to 32. To increase the contrastive power in Lctrst, we expand negative samples with Cj from different sequences in the same batch. We empirically set α to 0.1 in order to maintain comparable scales of the two losses. We utilize the Adam [Kingma and Ba, 2015] optimizer with a learning rate of 0.0001. The total training epochs are 50, 30, 10 on Breakfast, 50Salads and YTI, respectively, according to their scales.