Self-Attentive Associative Memory
Authors: Hung Le, Truyen Tran, Svetha Venkatesh
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We achieve competitive results with our proposed two-memory model in a diversity of machine learning tasks, from challenging synthetic problems to practical testbeds such as geometry, graph, reinforcement learning, and question answering. |
| Researcher Affiliation | Academia | Hung Le 1 Truyen Tran 1 Svetha Venkatesh 1 1Applied AI Institute, Deakin University, Geelong, Australia. Correspondence to: Hung Le <thai.le@deakin.edu.au>. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our source code is available at https://github.com/ thaihungle/SAM. |
| Open Datasets | Yes | We test different model configurations on two classical tasks for sequential and relational learning: associative retrieval (Ba et al., 2016a) and N th-farthest (Santoro et al., 2018); Algorithmic synthetic tasks (Graves et al., 2014); Convex hull, Traveling salesman problem (TSP) from Vinyals et al. (2015); b Ab I is a question answering dataset (Weston et al., 2015); We apply our memory to LSTM agents in Atari game environment using A3C training (Mnih et al., 2016). |
| Dataset Splits | No | The paper mentions training and testing but does not provide specific dataset split information (exact percentages, sample counts, or explicit predefined splits) for validation purposes. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning general environments like "Atari game environment". |
| Software Dependencies | No | The paper mentions the use of an "Adam" optimizer but does not specify software names with version numbers (e.g., Python 3.x, PyTorch x.x.x). |
| Experiment Setup | Yes | We run our STM with different nq = 1, 4, 8 using the same problem setting (8 16-dimensional input vectors), optimizer (Adam), batch size (1600) as in Santoro et al. (2018). We evaluate our model STM (nq = 8, d = 96) with the 4 following baselines: LSTM (Hochreiter & Schmidhuber, 1997), attentional LSTM (Bahdanau et al., 2015), NTM (Graves et al., 2014) and RMC (Santoro et al., 2018). We ablate our STM (d = 96, full features) by creating three other versions: small STM with transfer (d = 48), small STM without transfer (d = 48, w/o transfer) and STM without gates (d = 96, w/o gates). nq is fixed to 1 as the task does not require much relational learning. |