• Data mining techniques in health informatics: a case study from breast cancer research

      Lu, Jing; Hales, Alan; Rew, David; Keech, Malcolm; Fröhlingsdorf, Christian; Mills-Mullett, Alex; Wette, Christian; Southampton Solent University; University Hospital Southampton; University of Bedfordshire (Springer Verlag, 2015-08-11)
      This paper presents a case study of using data mining techniques in the analysis of diagnosis and treatment events related to Breast Cancer disease. Data from over 16,000 patients has been pre-processed and several data mining techniques have been implemented by using Weka (Waikato Environment for Knowledge Analysis). In particular, Generalized Sequential Patterns mining has been used to discover frequent patterns from disease event sequence profiles based on groups of living and deceased patients. Furthermore, five models have been evaluated in Classification with the objective to classify the patients based on selected attributes. This research showcases the data mining process and techniques to transform large amounts of patient data into useful information and potentially valuable patterns to help understand cancer outcomes.
    • Timeline and episode-structured clinical data: pre-processing for Data Mining and analytics

      Lu, Jing; Hales, Alan; Rew, David; Keech, Malcolm; Southampton Solent University; University Hospital Southampton; University of Bedfordshire (Institute of Electrical and Electronics Engineers Inc., 2016-06-23)
      Data Mining has been used in the healthcare domain for diagnosis and treatment analysis, resource management and fraud detection. It brings a set of tools and techniques that can be applied to large-scale patient data to discover underlying patterns and provide healthcare professionals an additional source of knowledge for making decisions. The Southampton Breast Cancer Data System (SBCDS) containing some 16,000 timeline-structured records is a visually rich and highly intuitive system for the manual and automated transfer of demographic, pathology and treatment data into an episode-based structure. While expansion of the data mining capability in SBCDS is one of the objectives of our research, real-world patient data is generally incomplete, inconsistent and containing errors. This case study will focus on the data pre-processing stage in order to clean the raw data and prepare the final dataset for use in data mining and analytics. Some initial results are given for sequential patterns mining and classification which highlight the advantages of the approach.