MultiMax: Sparse and Multi-Modal Attention Learning

Authors: Yuxuan Zhou, Mario Fritz, Margret Keuper

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through comprehensive analysis and evaluation, we show that Multi Max successfully produces a distribution that supresses irrelevant entries while preserving multimodality, with benefits in image classification, language modeling and machine translation.
Researcher Affiliation Academia 1University of Mannheim, Germany 2CISPA Helmholz Center for Information Security, Germany 3Max Planck Institute for Informatics, Saarland Informatics Campus, Germany.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical definitions and equations but no procedural algorithms.
Open Source Code Yes The code is available at https://github.com/ Zhou Yuxuan YX/Multi Max.
Open Datasets Yes We test the effectiveness of our Multi Max further on the Language Modeling task on Wiki Text-103 (Merity et al., 2016) using a 6-layer Transformer Decoder with 156M parameters.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or explicit methodology for validation splits) needed to reproduce the data partitioning. It mentions following existing training settings but does not detail the splits within this paper.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments.
Software Dependencies No The paper mentions software like "Pytorch" and "fairseq repository" but does not provide specific version numbers for any key software components.
Experiment Setup Yes The implementation is based on the official fairseq repository and the training setup is kept as default, i.e., 5e 4 learning rate with a maximum of 2048 tokens per GPU for 50k iterations on 4 GPUs.