reproducibilityindex.ai

Self-Attentive Associative Memory

Authors: Hung Le, Truyen Tran, Svetha Venkatesh

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We achieve competitive results with our proposed two-memory model in a diversity of machine learning tasks, from challenging synthetic problems to practical testbeds such as geometry, graph, reinforcement learning, and question answering.
Researcher Affiliation	Academia	Hung Le 1 Truyen Tran 1 Svetha Venkatesh 1 1Applied AI Institute, Deakin University, Geelong, Australia. Correspondence to: Hung Le <thai.le@deakin.edu.au>.
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our source code is available at https://github.com/ thaihungle/SAM.
Open Datasets	Yes	We test different model conﬁgurations on two classical tasks for sequential and relational learning: associative retrieval (Ba et al., 2016a) and N th-farthest (Santoro et al., 2018); Algorithmic synthetic tasks (Graves et al., 2014); Convex hull, Traveling salesman problem (TSP) from Vinyals et al. (2015); b Ab I is a question answering dataset (Weston et al., 2015); We apply our memory to LSTM agents in Atari game environment using A3C training (Mnih et al., 2016).
Dataset Splits	No	The paper mentions training and testing but does not provide specific dataset split information (exact percentages, sample counts, or explicit predefined splits) for validation purposes.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running its experiments, only mentioning general environments like "Atari game environment".
Software Dependencies	No	The paper mentions the use of an "Adam" optimizer but does not specify software names with version numbers (e.g., Python 3.x, PyTorch x.x.x).
Experiment Setup	Yes	We run our STM with different nq = 1, 4, 8 using the same problem setting (8 16-dimensional input vectors), optimizer (Adam), batch size (1600) as in Santoro et al. (2018). We evaluate our model STM (nq = 8, d = 96) with the 4 following baselines: LSTM (Hochreiter & Schmidhuber, 1997), attentional LSTM (Bahdanau et al., 2015), NTM (Graves et al., 2014) and RMC (Santoro et al., 2018). We ablate our STM (d = 96, full features) by creating three other versions: small STM with transfer (d = 48), small STM without transfer (d = 48, w/o transfer) and STM without gates (d = 96, w/o gates). nq is ﬁxed to 1 as the task does not require much relational learning.