Large-scale biomedical relation extraction across diverse relation types: model development and usability study on COVID-19
Name:
Large-ScaleBiomedicalRelationE ...
Size:
1.808Mb
Format:
PDF
Description:
final published version
Authors
Zhang, ZeyuFang, Meng
Wu, Rebecca
Zong, Hui
Huang, Honglian
Tong, Yuantao
Xie, Yujia
Cheng, Shiyang
Wei, Ziyi
Crabbe, M. James C.
Zhang, Xiaoyan
Wang, Ying
Affiliation
Tongji UniversityShanghai University of Traditional Chinese Medicine
Shanghai Eastern Hepatobiliary Surgery Hospital
University of California, Berkeley
Sichuan University
Oxford University
University of Bedfordshire
Shanxi University
Issue Date
2023-09-20Subjects
COVID-19biomedical relation extraction
biomedical text mining
clinical drug path
knowledge discovery
knowledge graph
pretrained language model
task-adaptive pretraining
Subject Categories::G700 Artificial Intelligence
Metadata
Show full item recordAbstract
Biomedical relation extraction (RE) is of great importance for researchers to conduct systematic biomedical studies. It not only helps knowledge mining, such as knowledge graphs and novel knowledge discovery, but also promotes translational applications, such as clinical diagnosis, decision-making, and precision medicine. However, the relations between biomedical entities are complex and diverse, and comprehensive biomedical RE is not yet well established. We aimed to investigate and improve large-scale RE with diverse relation types and conduct usability studies with application scenarios to optimize biomedical text mining. Data sets containing 125 relation types with different entity semantic levels were constructed to evaluate the impact of entity semantic information on RE, and performance analysis was conducted on different model architectures and domain models. This study also proposed a continued pretraining strategy and integrated models with scripts into a tool. Furthermore, this study applied RE to the COVID-19 corpus with article topics and application scenarios of clinical interest to assess and demonstrate its biological interpretability and usability. The performance analysis revealed that RE achieves the best performance when the detailed semantic type is provided. For a single model, PubMedBERT with continued pretraining performed the best, with an F1-score of 0.8998. Usability studies on COVID-19 demonstrated the interpretability and usability of RE, and a relation graph database was constructed, which was used to reveal existing and novel drug paths with edge explanations. The models (including pretrained and fine-tuned models), integrated tool (Docker), and generated data (including the COVID-19 relation graph database and drug paths) have been made publicly available to the biomedical text mining community and clinical researchers. This study provided a comprehensive analysis of RE with diverse relation types. Optimized RE models and tools for diverse relation types were developed, which can be widely used in biomedical text mining. Our usability studies provided a proof-of-concept demonstration of how large-scale RE can be leveraged to facilitate novel research.Citation
Zhang Z, Fang M, Wu R, Zong H, Huang H, Tong Y, Xie Y, Cheng S, Wei Z, Crabbe MJC, Zhang X, Wang Y (2023) 'Large-scale biomedical relation extraction across diverse relation types: model development and usability study on COVID-19', Journal of medical Internet research, 25 (e48115)Publisher
JMIR PublicationsDOI
10.2196/48115PubMed ID
37632414Additional Links
https://www.jmir.org/2023/1/e48115Type
ArticleLanguage
enISSN
1438-8871EISSN
1438-8871Sponsors
This work was supported by the National Natural Science Foundation of China (81972914 and 81573023), the Innovation Group Project of Shanghai Municipal Health Commission (2019CXJQ03), the Fundamental Research Funds for the Central Universities (22120200014), and the Shanghai “Rising Stars of Medical Talent” Youth Development Program (2019-72).ae974a485f413a2113503eed53cd6c53
10.2196/48115
Scopus Count
Collections
The following license files are associated with this item:
- Creative Commons
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International
Related articles
- BertSRC: transformer-based semantic relation classification.
- Authors: Lee Y, Son J, Song M
- Issue date: 2022 Sep 6
- Enhancing the coverage of SemRep using a relation classification approach.
- Authors: Ming S, Zhang R, Kilicoglu H
- Issue date: 2024 Jul
- Exploiting semantic patterns over biomedical knowledge graphs for predicting treatment and causative relations.
- Authors: Bakal G, Talari P, Kakani EV, Kavuluru R
- Issue date: 2018 Jun
- Extraction of semantic biomedical relations from text using conditional random fields.
- Authors: Bundschus M, Dejori M, Stetter M, Tresp V, Kriegel HP
- Issue date: 2008 Apr 23
- Ensemble pretrained language models to extract biomedical knowledge from literature.
- Authors: Li Z, Wei Q, Huang LC, Li J, Hu Y, Chuang YS, He J, Das A, Keloth VK, Yang Y, Diala CS, Roberts KE, Tao C, Jiang X, Zheng WJ, Xu H
- Issue date: 2024 Sep 1