Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Chirality in Action: Time-Aware Video Representation Learning by Latent Straightening
Authors: Piyush Nitin Bagad, Andrew Zisserman
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our model is based on an auto-encoder with a latent space with inductive bias inspired by perceptual straightening. We show that this results in a compact but time-sensitive video representation for the proposed task across three datasets: Something-Something, EPIC-Kitchens, and Charade. Our method (i) outperforms much larger video models pre-trained on large-scale video datasets, and (ii) leads to an improvement in classification performance on standard benchmarks when combined with these existing models. In this section, we first present results on our proposed chiral action recognition task. Then, we explore more general action recognition tasks where our descriptor is useful. |
| Researcher Affiliation | Academia | Piyush Bagad Andrew Zisserman VGG, Dept. of Engineering Science, University of Oxford |
| Pseudocode | No | The paper describes the Li FT model architecture and training objective using descriptive text and mathematical equations in Section 2 and visual diagrams in Figure 2 and Figure 11, but does not include a structured pseudocode or algorithm block. |
| Open Source Code | No | Likewise, the code and trained model will be released as open-source. |
| Open Datasets | Yes | In this work, we mine chiral pairs from three existing datasets (SSv2 [28], EPIC-Kitchens [17], and Charades [74], to set up a chiral evaluation benchmark. ... The proposed Ci A meta-dataset is based on three public datasets with instructions for construction provided. Ci A meta-dataset will be publicly released. We will also provide it along with the supplementary material. |
| Dataset Splits | Yes | For each chiral group, we split the videos into train and test sets following the split defined in the original dataset. Some basic numbers for each dataset are provided in Table 1 and visual examples are shown in Fig. 4. Table 7: Ci A dataset size. For each of the constituent datasets, we show the total number of videos in the proposed chiral split and also the average number of videos per chiral group. |
| Hardware Specification | Yes | We will state the compute resources used in the Supplemental. Briefly, we use various consumer-grade GPUs (e.g., Nvidia RTX A4000, Tesla P40, Quadro RTX 8000, NVIDIA RTX A6000). ... This feature computation is run on 4 NVIDIA RTX A4000 16GB GPUs in parallel. ... Once features are computed, Li FT is trained on a single consumer-grade GPU (e.g., NVIDIA RTX A4000, Tesla P40, Quadro RTX 8000, NVIDIA RTX A6000). |
| Software Dependencies | No | The paper mentions using a Ge LU activation [34] and Layer Norm [2], and training with Adam optimizer [41] and a LRPlateau scheduler. However, it does not provide specific version numbers for software libraries or frameworks used (e.g., PyTorch, TensorFlow, scikit-learn versions). |
| Experiment Setup | Yes | We linearly sample T=16 frames from each video and compute the features ahead of training. The input feature sequence is first projected to the space of the Encoder RD Rd; we choose d=384. The Encoder is a standard Transformer [14] with 4 layers and 8 attention heads each. ... The model is trained for 500 epochs with a batch size of 128 with Adam optimizer [41] with a learning rate of 0.001 and a LRPlateau scheduler. ... In case of non-linear probing, it is a two layer MLP with 512 hidden dimensions with Re LU non-linearity and Dropout of 0.1. ... We train the probe for 100 epochs using Adam optimizer with learning rate of 1e 5 and LRPlateau scheduler. |