AQUACOLD: Aggregated Query Understanding and Construction Over Linked Data
Natural Language Processing
Subject Categories::G560 Data Management
MetadataShow full item record
AbstractQuestion Answering (QA) systems provide direct answers to natural language (NL) questions posed by humans. Linked data (LD) provides an ideal knowledge base for answering complex QA as the framework expresses structure and relationships between data which assist in parsing the question, also the open ‘web of data’ or knowledge graph formed by interlinking between LD nodes provides a vast and varied domain of knowledge to search over. Despite this, recent attempts at NL QA over LD struggle when faced with complex questions due to the challenges in automatically parsing natural language into a structured LD query language such as SPARQL, forcing end users to learn these languages which can be challenging without a technical background. There is a need for a system which returns accurate answers to complex natural language questions over linked data, improving the accessibility of linked data search by abstracting the complexity of SPARQL whilst retaining its expressivity. This thesis presents AQUACOLD (Aggregated Query Understanding And Construction Over Linked Data) a novel LD QA system which harnesses the power of crowdsourcing to meet this need. AquaCold uses query templates built by system users to answer questions, rather than an algorithmic solution, and as such can handle queries of significant complexity. AquaCold’s effectiveness as a NL LD QA answering system was evaluated using the standard IR metrics of precision, recall and f-score on the QALD-9 question set, a benchmark used by many comparable NL QA systems. 30 participants took part in the study, attempting to answer a subset of QALD-9 questions using AquaCold. Results were analysed and compared against published results for similar NL LD QA systems, for both the AquaCold system overall and with respect to the dimensions of user IT skill to evaluate the utility for non-technical users specifically and with respect to the different crowdsourced components of the system to evaluate the utility of each. AquaCold performed strongly in the QALD9 benchmark study, recording greater f-score and query coverage results than comparable systems. Non-technical users achieved better scores when all or part of the question was available to answer using a query template, but achieved worse scores when no template was available and answers had to be obtained using the query builder component instead. This indicates a viable workflow where technically skilled users create templates which less technically able users could use to answer questions.
CitationCollis. N. (2021) 'AQUACOLD: Aggregated Query Understanding and Construction Over Linked Data'. PhD thesis. University of Bedfordshire.
PublisherUniversity of Bedfordshire
TypeThesis or dissertation
DescriptionA thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements of the degree of Doctor or Philosophy.
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International