Sparse and Structured Hopfield Networks
Authors: Saul José Rodrigues Dos Santos, Vlad Niculae, Daniel C Mcnamee, Andre Martins
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach. Experiments on synthetic and real-world tasks (multiple instance learning and text rationalization) showcase the usefulness of our proposed models using various kinds of sparse and structured transformations ( 5). |
| Researcher Affiliation | Collaboration | 1Instituto Superior T ecnico, Universidade de Lisboa, Lisbon, Portugal 2Instituto de Telecomunicac oes, Lisbon, Portugal 3Language Technology Lab, University of Amsterdam, The Netherlands 4Champalimaud Research, Lisbon, Portugal 5Unbabel, Lisbon, Portugal. |
| Pseudocode | Yes | Algorithm 1 Compute α-normmax by bisection. (Appendix B) |
| Open Source Code | Yes | Our code is available on https://github.com/deep-spin/SSHN (Footnote 1) |
| Open Datasets | Yes | We next investigate how often our Hopfield networks converge to metastable states, a crucial aspect for understanding the network s dynamics. To elucidate this, we examine ˆyΩ(βXq(t)) for the MNIST dataset, probing the number of nonzeros of these vectors. ... We run these models for K-MIL problems in the MNIST dataset (choosing 9 as target) and in three MIL benchmarks: Elephant, Fox, and Tiger (Ilse et al., 2018). ... The MIL benchmark datasets (Fox, Tiger and Elephant) comprise preprocessed and segmented color images sourced from the Corel dataset (Ilse et al., 2018). |
| Dataset Splits | Yes | We use 500 bags for testing and 500 bags for validation. (Appendix F.1) ... Model validation was conducted through a 10-fold nested cross-validation, repeated five times with different data splits where the first seed is used for hyperparameter tuning. (Appendix F.2) |
| Hardware Specification | No | The paper does not explicitly describe any specific hardware used for running the experiments (e.g., GPU/CPU models, memory specifications, or cloud computing instances). |
| Software Dependencies | No | The paper mentions software components like 'Adam optimizer' and 'transformer attention' but does not provide specific version numbers for any of these, which is required for reproducibility. |
| Experiment Setup | Yes | We train the models for 5 different random seeds, where the first one is used for tuning the hyperparameters. ... The hyperparameters are tuned via grid search, where the grid space is shown in Table 5. ... All models were trained for 50 epochs. We incorporated an early-stopping mechanism, with patience 5, that selects the optimal checkpoint based on performance on the validation set. (Appendix F.1) ... We used a head dimension of 200, ... and a head dropout of 0.5 ... We used a single attention head ... Aditionally we use a transition score of 0.001 and a train temperature of 0.1. (Appendix F.3) |