Temporal and Object Quantification Networks

Authors: Jiayuan Mao, Zhezheng Luo, Chuang Gan, Joshua B. Tenenbaum, Jiajun Wu, Leslie Pack Kaelbling, Tomer D. Ullman

IJCAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate TOQ-Nets on two perceptually and conceptually different benchmarks: trajectory-based sport event detection and human activity recognition, demonstrating several important contributions. First, TOQ-Nets outperform both convolutional and recurrent baselines for modeling temporal-relational concepts across benchmarks. Second, by exploiting temporal-relational features learned through supervised learning, TOQ-Nets achieve strong few-shot generalization to novel actions. Finally, TOQ-Nets exhibit strong generalization to scenarios with more entities than were present during training and are robust w.r.t. time warped input trajectories.
Researcher Affiliation Collaboration Jiayuan Mao1 , Zhezheng Luo1 , Chuang Gan2 , Joshua B. Tenenbaum1 , Jiajun Wu3 , Leslie Pack Kaelbling1 and Tomer D. Ullman4 1Massachusetts Institute of Technology 2MIT-IBM Watson AI Lab 3Stanford University 4Harvard University
Pseudocode Yes Algorithm 1 An example temporal structure that the second temporal reasoning layer can recognize.
Open Source Code No The paper does not provide an explicit statement about releasing the source code for the described methodology or a link to a code repository. It mentions "We provide details about our implementation and how we choose the model configurations in the supplementary material" but this does not guarantee code availability. The link provided is for a third-party simulator, not the authors' implementation.
Open Datasets Yes We collect training and evaluation datasets based on the gfootball simulator*, which provides a physics-based 3D football simulation. ... Toyota Smarthome [Das et al., 2019] is a dataset that contains videos of humans performing everyday activities... The volleyball dataset [Ibrahim et al., 2016] contains 4830 video clips...
Dataset Splits Yes Among the generated examples, 62% (2,462 or 3,077) are used for training, 15% are used for validation, and 23% are used for testing. (Soccer event detection) ... We split frames into training (9.9k), validation (2.5k), and testing (3.6k). (Toyota Smarthome) ... following the original split, i.e., 24, 15, and 16 of 55 videos are used for training, validation, and testing. (Volleyball Activity)
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments, such as specific GPU or CPU models, or cloud resources.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., "Python 3.8, PyTorch 1.9"). It mentions algorithms and network types but not the specific software implementations used.
Experiment Setup Yes We train our models with an Adam optimizer with a learning rate of 0.0001, and a batch size of 32. We apply early stopping for training when the validation accuracy does not improve for 100 epochs.