• Applications of concurrent access patterns in web usage mining

      Lu, Jing; Keech, Malcolm; Wang, Cuiqing; University of Bedfordshire (Springer, 2013-08)
      This paper builds on the original data mining and modelling research which has proposed the discovery of novel structural relation patterns, applying the approach in web usage mining. The focus of attention here is on concurrent access patterns (CAP), where an overarching framework illuminates the methodology for web access patterns post-processing. Data pre-processing, pattern discovery and patterns analysis all proceed in association with access patterns mining, CAP mining and CAP modelling. Pruning and selection of access patterns takes place as necessary, allowing further CAP mining and modelling to be pursued in the search for the most interesting concurrent access patterns. It is shown that higher level CAPs can be modelled in a way which brings greater structure to bear on the process of knowledge discovery. Experiments with real-world datasets highlight the applicability of the approach in web navigation.
    • Applications of concurrent sequential patterns in protein data mining

      Wang, Cuiqing; Keech, Malcolm; Lu, Jing; University of Bedfordshire (Springer, 2014)
      Protein sequences of the same family typically share common patterns which imply their structural function and biological relationship. Traditional sequential patterns mining has its focus on mining frequently occurring sub-sequences. However, a number of applications motivate the search for more structured patterns, such as protein motif mining. This paper builds on the original idea of structural relation patterns and applies the Concurrent Sequential Patterns (ConSP) mining approach in bioinformatics. Specifically, a new method and algorithms are presented using support vectors as the data structure for the extraction of novel patterns in protein sequences. Experiments with real-world protein datasets highlight the applicability of the ConSP methodology in protein data mining. The results show the potential for knowledge discovery in the field of protein structure identification.
    • Concurrent sequential patterns mining and frequent partial orders modelling

      Lu, Jing; Keech, Malcolm; Chen, Weiru; Wang, Cuiqing; University of Bedfordshire (Inderscience Publishers, 2013)
      Structural relation patterns have been introduced to extend the search for complex patterns often hidden behind large sequences of data, with applications (e.g.) in the analysis of customer behaviour, bioinformatics and web mining. In the overall context of frequent itemset mining, the focus of attention in the structural relation patterns family has been on the mining of concurrent sequential patterns, where a companion approach to graph-based modelling can be illuminating. The crux of this paper sets out to establish the connection between concurrent sequential patterns and frequent partial orders, which are well known for discovering ordering information from sequence databases. It is shown that frequent partial orders can be derived from concurrent sequential patterns, under certain conditions, and worked examples highlight the relationship. Experiments with real and synthetic datasets contrast the results of the data mining and modelling involved.
    • Protein data modelling for concurrent sequential patterns

      Lu, Jing; Keech, Malcolm; Wang, Cuiqing; University of Bedfordshire (DEXA, 2014-09)
      Protein sequences from the same family typically share common patterns which imply their structural function and biological relationship. The challenge of identifying protein motifs is often addressed through mining frequent itemsets and sequential patterns, where post-processing is a useful technique. Earlier work has shown that Concurrent Sequential Patterns mining can be applied in bioinformatics, e.g. to detect frequently occurring concurrent protein sub-sequences. This paper presents a companion approach to data modelling and visualisation, applying it to real-world protein datasets from the PROSITE and NCBI databases. The results show the potential for graph-based modelling in representing the integration of higher level patterns common to all or nearly all of the protein sequences.