MABe22: A Multi-Species Multi-Task Benchmark for Learned Representations of Behavior

Authors: Jennifer J. Sun, Markus Marks, Andrew Wesley Ulmer, Dipam Chakraborty, Brian Geuther, Edward Hayes, Heng Jia, Vivek Kumar, Sebastian Oleszko, Zachary Partridge, Milan Peelman, Alice Robie, Catherine E Schretter, Keith Sheppard, Chao Sun, Param Uttarwar, Julian Morgan Wagner, Erik Werner, Joseph Parker, Pietro Perona, Yisong Yue, Kristin Branson, Ann Kennedy

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test multiple state-of-the-art self-supervised video and trajectory representation learning methods to demonstrate the use of our benchmark, revealing that methods developed using human action datasets do not fully translate to animal datasets. We perform a large set of experiments to evaluate the performance of representation learning methods on MABe 2022 (Sections 5.1, 5.2).
Researcher Affiliation Collaboration 1Caltech 2Northwestern University 3AICrowd 4JAX Labs 5Zhejiang University 6IRLAB Therapeutics 7University of New South Wales 8Ghent University 9Janelia 10Saarland University.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Our dataset and related code is available at: https://sites.google.com/view/computational-behavior/our-datasets/mabe2022-dataset. Links to access our code and dataset, including code from challenge winners where available, are on our dataset website at https://sites.google.com/view/computational-behavior/our-datasets/mabe2022-dataset.
Open Datasets Yes We introduce MABe22, a large-scale, multi-agent video and trajectory benchmark to assess the quality of learned behavior representations. Our dataset and related code is available at: https://sites.google.com/view/computational-behavior/our-datasets/mabe2022-dataset. The dataset is available on the Caltech public data repository at https://data.caltech.edu/records/20186, where it will be retained indefinitely and available for download by all third parties. The data.caltech.edu posting has accompanying DOI https://doi.org/10.22002/D1.20186.
Dataset Splits Yes The dataset includes a recommended train/test split which was used for the Multi-Agent Behavior Challenge. Data was randomly split into training, test, and private-test sets (where the private test set was withheld from challenge evaluation until the end of the competition period, to avoid overfitting.) We split the data into 4 sets, with each set containing distinct videos and flies. User train (30%): Data given to the competitor to learn their embedding... Evaluation train (50%): Data used to train the linear classifiers during evaluation. Test 1 (10%): Data used to measure performance... Test 2 10%): Final set of data used to measure performance...
Hardware Specification No The paper describes hardware used for data collection (e.g., camera types, lenses, LED panels) but does not specify the hardware (e.g., GPU models, CPU types) used for running the computational experiments or training models. It mentions "batch size is 128 per GPU" but no specific GPU model.
Software Dependencies No The paper mentions various software tools and optimizers used (e.g., HRnet, VIA video annotator, BORIS, Fly Tracker, APT, JAABA, Adam optimizer, SGD optimizer, AdamW optimizer, SimCLR, MoCo, BERT, GPT, Perceiver, Point Net, Scikit-learn, CPLEX, Gecode, Choco). However, it does not provide specific version numbers for these software dependencies, only references to the papers that introduce them.
Experiment Setup Yes For video representation learning models, we use D = 128. For trajectory methods, we use D = 128 for mice and D = 256 for flies. We use linear least squares with l2 regularized (Ridge) classification/regression as model and F1/mean-squared-error (MSE) as evaluation metrics (See Appendix D for details). Appendix E contains detailed training parameters for MAE (Table 13), Mask Feat (Table 14), and ρBYOL (Table 15), including optimizer type, learning rates, batch sizes, epochs, and augmentation strategies.