Self-Attentive Associative Memory

Authors: Hung Le, Truyen Tran, Svetha Venkatesh

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We achieve competitive results with our proposed two-memory model in a diversity of machine learning tasks, from challenging synthetic problems to practical testbeds such as geometry, graph, reinforcement learning, and question answering.
Researcher Affiliation Academia Hung Le 1 Truyen Tran 1 Svetha Venkatesh 1 1Applied AI Institute, Deakin University, Geelong, Australia. Correspondence to: Hung Le <thai.le@deakin.edu.au>.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our source code is available at https://github.com/ thaihungle/SAM.
Open Datasets Yes We test different model configurations on two classical tasks for sequential and relational learning: associative retrieval (Ba et al., 2016a) and N th-farthest (Santoro et al., 2018); Algorithmic synthetic tasks (Graves et al., 2014); Convex hull, Traveling salesman problem (TSP) from Vinyals et al. (2015); b Ab I is a question answering dataset (Weston et al., 2015); We apply our memory to LSTM agents in Atari game environment using A3C training (Mnih et al., 2016).
Dataset Splits No The paper mentions training and testing but does not provide specific dataset split information (exact percentages, sample counts, or explicit predefined splits) for validation purposes.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning general environments like "Atari game environment".
Software Dependencies No The paper mentions the use of an "Adam" optimizer but does not specify software names with version numbers (e.g., Python 3.x, PyTorch x.x.x).
Experiment Setup Yes We run our STM with different nq = 1, 4, 8 using the same problem setting (8 16-dimensional input vectors), optimizer (Adam), batch size (1600) as in Santoro et al. (2018). We evaluate our model STM (nq = 8, d = 96) with the 4 following baselines: LSTM (Hochreiter & Schmidhuber, 1997), attentional LSTM (Bahdanau et al., 2015), NTM (Graves et al., 2014) and RMC (Santoro et al., 2018). We ablate our STM (d = 96, full features) by creating three other versions: small STM with transfer (d = 48), small STM without transfer (d = 48, w/o transfer) and STM without gates (d = 96, w/o gates). nq is fixed to 1 as the task does not require much relational learning.