Knowledge-Based Sequence Mining with ASP
Authors: Martin Gebser, Thomas Guyet, René Quiniou, Javier Romero, Torsten Schaub
IJCAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To this end, we conducted experiments on simulated databases. First, we present time efficiency results comparing our approach with the CP-based one of CPSM [Negrevergne and Guns, 2015]. Then, we illustrate the effectiveness of preference handling to reduce the size of output pattern sets. [...] We ran our respective encodings on two classical databases of the UCI collection [Lichman, 2013]: jmlr (a natural language processing database; each transaction is an abstract of a paper from the Journal of Machine Learning Research) and Unix user (each transaction is a series of shell commands executed by a user during one session). |
| Researcher Affiliation | Academia | Martin Gebser,3 Thomas Guyet,1 Ren e Quiniou,2 Javier Romero,3 Torsten Schaub2,3 1AGROCAMPUS-OUEST/IRISA, France 2Inria Centre de Rennes Bretagne Atlantique, France 3University of Potsdam, Germany |
| Pseudocode | Yes | Listing 2: Basic encoding of frequent sequence mining, Listing 3: Encoding part for selecting maximal patterns, Listing 4: Modifications for selecting closed patterns, Listing 5: Preference type implementation |
| Open Source Code | No | The paper states 'The databases used in our experiments are available at https: //sites.google.com/site/aspseqmining.' but does not explicitly state that the source code for their methodology is available or provide a link to it. |
| Open Datasets | Yes | we generated databases using a retro-engineering process: [...] The databases used in our experiments are available at https: //sites.google.com/site/aspseqmining. [...] We ran our respective encodings on two classical databases of the UCI collection [Lichman, 2013]: jmlr (a natural language processing database; each transaction is an abstract of a paper from the Journal of Machine Learning Research) and Unix user (each transaction is a series of shell commands executed by a user during one session). |
| Dataset Splits | No | The paper describes generating and using databases but does not specify any explicit train/validation/test splits, percentages, or methodology for data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions 'clingo' and 'asprin' (ASP systems) and 'gecode' (CP solver) but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | It relies on two parameters: max determines a maximum length for patterns of interest, and k specifies the frequency threshold. [...] In our experiments, we vary the mean length from 10 to 40, and contained items are randomly generated according to a Gaussian law (some items are more frequent than others) over a vocabulary of 50 items. [...] The timeout was set to 20 minutes. |