Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Authors: Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein6644-6652
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance on five different benchmark datasets for two challenging multimodal tasks: Video Question Answering and Text-To-Video Retrieval. Furthermore, we provide a theoretical probabilistic error bound substantiating our empirical results and analyze failure cases. |
| Researcher Affiliation | Collaboration | Elad Amrani1,2, Rami Ben-Ari1, Daniel Rotman1, Alex Bronstein2 1IBM Research AI 2Technion |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found in the paper. |
| Open Source Code | Yes | Code: https://github.com/elad-amrani/ssml. |
| Open Datasets | Yes | Ultimately, we integrate our proposed building block into an embedding model and learn superior joint video-text representations that achieve comparable state-of-the-art performance on five datasets: MSRVTT (Xu et al. 2016), LSMDC (Rohrbach et al. 2015), MSVD (Chen and Dolan 2011), MSRVTT-QA (Xu et al. 2017) and MSVD-QA (Xu et al. 2017); for two different tasks: Video Question Answering and Text to Video Retrieval. We train our model using the How To100M (Miech et al. 2019) narrated video dataset |
| Dataset Splits | No | The paper mentions training and evaluation on specific datasets (e.g., MSRVTT, LSMDC, MSVD, How To100M) and states 'See extended version (Amrani et al. 2020) for detailed statistics of each dataset,' implying that specific dataset splits are not detailed within the main paper. |
| Hardware Specification | Yes | Training the model on the large How To100M dataset is done on a single V100 GPU and takes less than 24 hours. |
| Software Dependencies | No | The paper mentions several software components and models (e.g., word2vec, ADAM optimizer, FAISS, Resnet-152, ResNeXt-101) with citations, but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We use dv = 4096, dc = 300, and d = 6144. We use the ADAM (Kingma and Ba 2015) optimizer with a fixed learning rate of 10 3. |