A Framework for Inference Inspired by Human Memory Mechanisms
Authors: Xiangyu Zeng, Jie Lin, Piao Hu, Ruizheng Huang, Zhicheng Zhang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We exploratively apply our PMI to improve prevailing Transformers and CNN models on question-answering tasks like b Ab I-20k and Sort-of-CLEVR datasets, as well as detecting equilateral triangles, language modeling and image classification tasks, and in each case, our PMI enhancements consistently outshine their original counterparts significantly. |
| Researcher Affiliation | Academia | Xiangyu Zeng, Jie Lin B, Piao Hu, Ruizheng Huang, Zhicheng Zhang Laboratory of Intelligent Collaborative Computing, School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, China {zengxy,hupiao,huangrz,zhangzc}@std.uestc.edu.cn, linjie@uestc.edu.cn |
| Pseudocode | Yes | A PSEUDO CODES Section A, includes 'Algorithm 1: PMI-TR Algorithm'. |
| Open Source Code | Yes | Code is available at https://github.com/zengxyyu/PMI-TR. |
| Open Datasets | Yes | To assess the efficacy of the PMI module in discovering and learning inferring entities and their relations, we conduct a preliminary exploration by incorporating it as a replacement for the pairwise self-attention layers in Transformers and Vi T (Dosovitskiy et al., 2020), where memory components are shared globally. This modified architecture, called PMI-TR, are then applied to a diverse range of tasks, including visual QA, text-based QA, detecting equilateral triangles and language modeling. Readers can refer to Appendices E and F for the model hyperparameter settings and detailed descriptions of each task, respectively. ... Sort-of-CLEVR (Santoro et al., 2017) is a dataset similar to CLEVR... BAb I is a pure text-based QA dataset (Weston et al., 2015)... Enwik8 (Matt, 2011), Wiki Text-103 (Merity et al., 2016) and PG19 (Rae et al., 2019)... CIFAR-10 is a benchmark image dataset commonly used... |
| Dataset Splits | Yes | The b Ab I-20k dataset consists of 20 distinct text-based QA tasks... Each task is divided into training, validation and test datasets, with 9k, 1k and 1k questions respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU specifications, or cloud computing instances. |
| Software Dependencies | No | The paper mentions optimizers like Adam and AdamW (Kingma & Ba, 2014; Loshchilov & Hutter, 2017) in Table 7 and 8, but it does not specify version numbers for any software dependencies or libraries used for implementation (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | The hyperparameter settings of the PMI-TR model on all tasks are shown in Table 7 and Table 8, where Adam and Adam W were proposed by (Kingma & Ba, 2014) and (Loshchilov & Hutter, 2017), respectively. ... Table 7: The hyperparameter setting of PMI-TR model on four tasks. Parameters: Top-k, Number of layers, Number of attention heads, Embedding dimensions, Optimizer, Weight decay, Learning rate, Batch size, Inp Dropout, Seed, Number of working memory slots (N), Number of long-term memory segments (M), Size of each working memory slot (Dm), Size of each long-term memory segment, Number of MLP layers in attention, Memory attention heads, Gate style unit, Initial alpha value. |