Poster Session

Leveraging Large Language Models for Extracting Protein-Protein Interactions from Biomedical Corpora

Hasin Rehana, University of North DakotaFollow
Nur Bengisu Çam, Bogazici University, Turkey
Mert Basmaci, Siemens
Jie Zheng, University Of Michigan
Christianah Jemiyo, University of North Dakota
Yongqun He, University of Michigan
Arzucan Özgür, Bogazici University, Turkey
Junguk Hur, University of North Dakota

Presentation Type

Poster

Student

Yes

Abstract

The extraction of protein-protein interactions (PPIs) is pivotal for our understanding in areas of genetic mechanisms, disease pathogenesis, and drug development. However, with the rapid growth of biomedical literature, automated and precise PPI extraction is becoming essential for efficient scientific discovery. This study focuses on leveraging large language models, specifically generative pre-trained transformers (GPT) and bidirectional encoder representations from transformers (BERT) for the extraction of PPIs. We have evaluated the capability of GPT and BERT models for PPI identification using three manually curated gold-standard corpora: Learning Language in Logic (LLL), Human Protein Reference Database (HPRD50), and Interaction Extraction Performance Assessment (IEPA). Notably, BioBERT emerged as a leader, recording the highest recall at 91.95% and an F1-score of 86.84% on LLL dataset. Interestingly, despite not being specifically trained for biomedical texts, GPT-4 achieved commendable performance with the highest precision of 88.37%, a closely comparable F1-score of 86.49% on the same dataset. For the HPRD50 and IEPA datasets, BERT-based models continued to outperform in terms of overall effectiveness. Nonetheless, GPT-4 maintained close proximity in performance, demonstrating its potential capabilities in accurately detecting PPIs from text data. This study suggests prospects for future investigations into the fine-tuning of GPT-4 for specialized tasks in the biomedical domain.

Keywords – PPI, Large Language Model (LLM), GPT, BERT.

Start Date

2-6-2024 1:00 PM

End Date

2-6-2024 2:00 PM

This document is currently not available here.

COinS

Feb 6th, 1:00 PM Feb 6th, 2:00 PM

Leveraging Large Language Models for Extracting Protein-Protein Interactions from Biomedical Corpora

Volstorff A

Keywords – PPI, Large Language Model (LLM), GPT, BERT.

Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange

Poster Session

Leveraging Large Language Models for Extracting Protein-Protein Interactions from Biomedical Corpora

Presentation Type

Student

Abstract

Start Date

End Date

Author Corner

Links

Browse

Search

Open PRAIRIE: Open Public Research Access Institutional Repository and Information Exchange

Poster Session

Leveraging Large Language Models for Extracting Protein-Protein Interactions from Biomedical Corpora

Presenter Information/ Coauthors Information

Presentation Type

Student

Abstract

Start Date

End Date

Share

Author Corner

Links

Browse

Search