reproducibilityindex.ai

Continual General Chunking Problem and SyncMap

Authors: Danilo Vasconcellos Vargas, Toshitake Asabuki10006-10014

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on ﬁxed chunks, probabilistic chunks and temporal structures suggest that Sync Map reaches near optimal solutions. The same is true for continual variations of them, i.e., when such probabilistic chunks or temporal structures change throughout the experiment. The experiments compose a total of nine different tests encompassing ﬁxed chunks, mixed structures, their continual variations, long chunks, overlapped chunks and real world scenarios. Results and Analysis In this paper, we deﬁne the optimality of solutions by the degree of correlation with the ground truth. For all tests, as a correlation metric, we measured the normalized mutual information scores for Wor2vec, MRIL, PARSER and Sync Map (Tables 1, 2 and 3).
Researcher Affiliation	Academia	1Kyushu University, Fukuoka, Japan 2The University of Tokyo, Tokyo, Japan 3Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
Pseudocode	No	The paper describes the dynamics of Sync Map using mathematical equations and textual explanations but does not include a structured pseudocode or algorithm block.
Open Source Code	No	The paper does not provide concrete access to source code for the methodology described. There is no explicit statement of code release or a link to a repository.
Open Datasets	Yes	Real World Scenarios. We test in two variations of a real world scenario. Speciﬁcally, the recognition of probabilistic chunks in the ﬁrst-order Markov model of theme transitions for humpback whales song types (Garland et al. 2017).
Dataset Splits	No	The paper describes how samples are generated and processed sequentially over time (e.g., '100000 samples of the problem' followed by 'second problem also presenting 100000 samples') but does not specify explicit training, validation, and test dataset splits with percentages or counts.
Hardware Specification	Yes	All tests are run on a Mac Book Pro 10.15.5 2.3Ghz 16GB laptop as they demand little computational effort.
Software Dependencies	No	The paper mentions software components like DBSCAN and Word2vec but does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup	Yes	Sync Map s parameters α and k are ﬁxed to respectively 0.1 and 3. Regarding the PARSER algorithm, since it ﬁnds possible n-grams, hence not whole chunks, we ﬁrst excluded the unnecessary long n-grams (n > 6), and concatenated the rest of the short segments, of which share the same element. These resultant segments were regarded as 'chunks' that PARSER extracted. A dense deep neural network model was used as model for the Word2vec with a latent dimension of 3 and an output layer with softmax and size equal to the number of inputs. The chosen training parameters are 10 epochs, 1e 3 learning rate and 64 batch size with a mean squared error as loss. Regarding MRIL, we used ﬁve output neurons for all tasks, with the learning rate 1e 3.