MultiMax: Sparse and Multi-Modal Attention Learning
Authors: Yuxuan Zhou, Mario Fritz, Margret Keuper
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through comprehensive analysis and evaluation, we show that Multi Max successfully produces a distribution that supresses irrelevant entries while preserving multimodality, with benefits in image classification, language modeling and machine translation. |
| Researcher Affiliation | Academia | 1University of Mannheim, Germany 2CISPA Helmholz Center for Information Security, Germany 3Max Planck Institute for Informatics, Saarland Informatics Campus, Germany. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It provides mathematical definitions and equations but no procedural algorithms. |
| Open Source Code | Yes | The code is available at https://github.com/ Zhou Yuxuan YX/Multi Max. |
| Open Datasets | Yes | We test the effectiveness of our Multi Max further on the Language Modeling task on Wiki Text-103 (Merity et al., 2016) using a 6-layer Transformer Decoder with 156M parameters. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or explicit methodology for validation splits) needed to reproduce the data partitioning. It mentions following existing training settings but does not detail the splits within this paper. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. |
| Software Dependencies | No | The paper mentions software like "Pytorch" and "fairseq repository" but does not provide specific version numbers for any key software components. |
| Experiment Setup | Yes | The implementation is based on the official fairseq repository and the training setup is kept as default, i.e., 5e 4 learning rate with a maximum of 2048 tokens per GPU for 50k iterations on 4 GPUs. |