reproducibilityindex.ai

Learning to Rehearse in Long Sequence Memorization

Authors: Zhu Zhang, Chang Zhou, Jianxin Ma, Zhijie Lin, Jingren Zhou, Hongxia Yang, Zhou Zhao

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate the performance of our rehearsal memory by the synthetic b Ab I task and several downstream tasks, including text/video question answering and recommendation on long sequences. In this section, we ﬁrst verify our rehearsal memory on the widely-used short-sequence reasoning task b Ab I. Next, we mainly compare our approach with diverse baselines on several long-sequence reasoning tasks. We then perform ablation studies on the memory rehearsal techniques and analyze the impact of crucial hyper-parameters.
Researcher Affiliation	Collaboration	1Zhejiang University, China 2DAMO Academy, Alibaba Group, China.
Pseudocode	No	The paper describes methods in text and uses figures to illustrate components (e.g., Figure 1 and 2), but it does not contain a formal 'Pseudocode' or 'Algorithm' block.
Open Source Code	No	The paper does not include any explicit statement about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	The b Ab I dataset (Weston et al., 2015) is a synthetic text question answering benchmark and widely applied to evaluate the memorization and reasoning performance of MANNs. We apply the Narrative QA dataset (Koˇcisk y et al., 2018) with long input contents for long-sequence text question answering. The Activity Net-QA dataset (Yu et al., 2019) contains 5,800 videos from the Activity Net (Caba Heilbron et al., 2015). The XLong dataset (Ren et al., 2019) is sampled from the click logs on Alibaba.
Dataset Splits	Yes	Table 2. Performance Comparisons for Long-Sequence Text Question Answering on Narrative QA. Method Setting Val MRR Test MRR. During the training stage, we simultaneously develop self-supervised rehearsal training and task-speciﬁc reasoning training based on the memory M.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using an 'Adam optimizer' and components like 'Transformer encoder' and 'GRU unit', but does not specify any software versions for programming languages, libraries, or frameworks (e.g., Python version, TensorFlow/PyTorch versions).
Experiment Setup	Yes	We set the layer number of the Transformer encoder and bi-directional Transformer decoder to 3. The head number in Multi-Head Attention is set to 4. We set λ1, λ2 and λ3 to 1.0, 0.5 and 1.0, respectively. The number B of history fragments is set to 6. During training, we apply an Adam optimizer (Duchi et al., 2011) to minimize the multi-task loss Lrm, where the initial learning rate is set to 0.001. We set the dx and dmodel to 128. The number K of memory slots is set to 20. And we naturally take each sentence in input texts as a segment and the maximum length N of segments is set to 15. For our rehearsal memory, we set the dx and dmodel to 256. The number K of memory slots is set to 20. We naturally take each sentence in summaries as a segment and the maximum length N of segments is set to 20. For our rehearsal memory, we set the dx and dmodel to 256. The number K of memory slots and length N of segments are both set to 20. For our rehearsal memory, we set the dx and dmodel to 64. The number K of memory slots and length N of segments are both set to 20.