Continual General Chunking Problem and SyncMap
Authors: Danilo Vasconcellos Vargas, Toshitake Asabuki10006-10014
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on fixed chunks, probabilistic chunks and temporal structures suggest that Sync Map reaches near optimal solutions. The same is true for continual variations of them, i.e., when such probabilistic chunks or temporal structures change throughout the experiment. The experiments compose a total of nine different tests encompassing fixed chunks, mixed structures, their continual variations, long chunks, overlapped chunks and real world scenarios. Results and Analysis In this paper, we define the optimality of solutions by the degree of correlation with the ground truth. For all tests, as a correlation metric, we measured the normalized mutual information scores for Wor2vec, MRIL, PARSER and Sync Map (Tables 1, 2 and 3). |
| Researcher Affiliation | Academia | 1Kyushu University, Fukuoka, Japan 2The University of Tokyo, Tokyo, Japan 3Okinawa Institute of Science and Technology Graduate University, Okinawa, Japan |
| Pseudocode | No | The paper describes the dynamics of Sync Map using mathematical equations and textual explanations but does not include a structured pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. There is no explicit statement of code release or a link to a repository. |
| Open Datasets | Yes | Real World Scenarios. We test in two variations of a real world scenario. Specifically, the recognition of probabilistic chunks in the first-order Markov model of theme transitions for humpback whales song types (Garland et al. 2017). |
| Dataset Splits | No | The paper describes how samples are generated and processed sequentially over time (e.g., '100000 samples of the problem' followed by 'second problem also presenting 100000 samples') but does not specify explicit training, validation, and test dataset splits with percentages or counts. |
| Hardware Specification | Yes | All tests are run on a Mac Book Pro 10.15.5 2.3Ghz 16GB laptop as they demand little computational effort. |
| Software Dependencies | No | The paper mentions software components like DBSCAN and Word2vec but does not provide specific version numbers for these or any other ancillary software dependencies. |
| Experiment Setup | Yes | Sync Map s parameters α and k are fixed to respectively 0.1 and 3. Regarding the PARSER algorithm, since it finds possible n-grams, hence not whole chunks, we first excluded the unnecessary long n-grams (n > 6), and concatenated the rest of the short segments, of which share the same element. These resultant segments were regarded as 'chunks' that PARSER extracted. A dense deep neural network model was used as model for the Word2vec with a latent dimension of 3 and an output layer with softmax and size equal to the number of inputs. The chosen training parameters are 10 epochs, 1e 3 learning rate and 64 batch size with a mean squared error as loss. Regarding MRIL, we used five output neurons for all tasks, with the learning rate 1e 3. |