reproducibilityindex.ai

Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning

Authors: Elad Amrani, Rami Ben-Ari, Daniel Rotman, Alex Bronstein6644-6652

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate how our noise estimation can be broadly integrated and achieves comparable results to state-of-the-art performance on ﬁve different benchmark datasets for two challenging multimodal tasks: Video Question Answering and Text-To-Video Retrieval. Furthermore, we provide a theoretical probabilistic error bound substantiating our empirical results and analyze failure cases.
Researcher Affiliation	Collaboration	Elad Amrani1,2, Rami Ben-Ari1, Daniel Rotman1, Alex Bronstein2 1IBM Research AI 2Technion
Pseudocode	No	No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code	Yes	Code: https://github.com/elad-amrani/ssml.
Open Datasets	Yes	Ultimately, we integrate our proposed building block into an embedding model and learn superior joint video-text representations that achieve comparable state-of-the-art performance on ﬁve datasets: MSRVTT (Xu et al. 2016), LSMDC (Rohrbach et al. 2015), MSVD (Chen and Dolan 2011), MSRVTT-QA (Xu et al. 2017) and MSVD-QA (Xu et al. 2017); for two different tasks: Video Question Answering and Text to Video Retrieval. We train our model using the How To100M (Miech et al. 2019) narrated video dataset
Dataset Splits	No	The paper mentions training and evaluation on specific datasets (e.g., MSRVTT, LSMDC, MSVD, How To100M) and states 'See extended version (Amrani et al. 2020) for detailed statistics of each dataset,' implying that specific dataset splits are not detailed within the main paper.
Hardware Specification	Yes	Training the model on the large How To100M dataset is done on a single V100 GPU and takes less than 24 hours.
Software Dependencies	No	The paper mentions several software components and models (e.g., word2vec, ADAM optimizer, FAISS, Resnet-152, ResNeXt-101) with citations, but does not provide specific version numbers for these software dependencies.
Experiment Setup	Yes	We use dv = 4096, dc = 300, and d = 6144. We use the ADAM (Kingma and Ba 2015) optimizer with a ﬁxed learning rate of 10 3.