reproducibilityindex.ai

MultiMax: Sparse and Multi-Modal Attention Learning

Authors: Yuxuan Zhou, Mario Fritz, Margret Keuper

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through comprehensive analysis and evaluation, we show that Multi Max successfully produces a distribution that supresses irrelevant entries while preserving multimodality, with benefits in image classification, language modeling and machine translation.
Researcher Affiliation	Academia	1University of Mannheim, Germany 2CISPA Helmholz Center for Information Security, Germany 3Max Planck Institute for Informatics, Saarland Informatics Campus, Germany.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical definitions and equations but no procedural algorithms.
Open Source Code	Yes	The code is available at https://github.com/ Zhou Yuxuan YX/Multi Max.
Open Datasets	Yes	We test the effectiveness of our Multi Max further on the Language Modeling task on Wiki Text-103 (Merity et al., 2016) using a 6-layer Transformer Decoder with 156M parameters.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or explicit methodology for validation splits) needed to reproduce the data partitioning. It mentions following existing training settings but does not detail the splits within this paper.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies	No	The paper mentions software like "Pytorch" and "fairseq repository" but does not provide specific version numbers for any key software components.
Experiment Setup	Yes	The implementation is based on the official fairseq repository and the training setup is kept as default, i.e., 5e 4 learning rate with a maximum of 2048 tokens per GPU for 50k iterations on 4 GPUs.