Continual General Chunking Problem and SyncMap

Authors: Danilo Vasconcellos Vargas, Toshitake Asabuki10006-10014

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on fixed chunks, probabilistic chunks and temporal structures suggest that Sync Map reaches near optimal solutions. The same is true for continual variations of them, i.e., when such probabilistic chunks or temporal structures change throughout the experiment. The experiments compose a total of nine different tests encompassing fixed chunks, mixed structures, their continual variations, long chunks, overlapped chunks and real world scenarios. Results and Analysis In this paper, we define the optimality of solutions by the degree of correlation with the ground truth. For all tests, as a correlation metric, we measured the normalized mutual information scores for Wor2vec, MRIL, PARSER and Sync Map (Tables 1, 2 and 3).
Researcher Affiliation Academia 1Kyushu University, Fukuoka, Japan 2The University of Tokyo, Tokyo, Japan 3Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan
Pseudocode No The paper describes the dynamics of Sync Map using mathematical equations and textual explanations but does not include a structured pseudocode or algorithm block.
Open Source Code No The paper does not provide concrete access to source code for the methodology described. There is no explicit statement of code release or a link to a repository.
Open Datasets Yes Real World Scenarios. We test in two variations of a real world scenario. Specifically, the recognition of probabilistic chunks in the first-order Markov model of theme transitions for humpback whales song types (Garland et al. 2017).
Dataset Splits No The paper describes how samples are generated and processed sequentially over time (e.g., '100000 samples of the problem' followed by 'second problem also presenting 100000 samples') but does not specify explicit training, validation, and test dataset splits with percentages or counts.
Hardware Specification Yes All tests are run on a Mac Book Pro 10.15.5 2.3Ghz 16GB laptop as they demand little computational effort.
Software Dependencies No The paper mentions software components like DBSCAN and Word2vec but does not provide specific version numbers for these or any other ancillary software dependencies.
Experiment Setup Yes Sync Map s parameters α and k are fixed to respectively 0.1 and 3. Regarding the PARSER algorithm, since it finds possible n-grams, hence not whole chunks, we first excluded the unnecessary long n-grams (n > 6), and concatenated the rest of the short segments, of which share the same element. These resultant segments were regarded as 'chunks' that PARSER extracted. A dense deep neural network model was used as model for the Word2vec with a latent dimension of 3 and an output layer with softmax and size equal to the number of inputs. The chosen training parameters are 10 epochs, 1e 3 learning rate and 64 batch size with a mean squared error as loss. Regarding MRIL, we used five output neurons for all tasks, with the learning rate 1e 3.