Abstract

Martin Hofmann-Apitius
Direct Use of Information Extraction from Scientific Text for Modeling and Simulation in the Life Sciences


Scientific biomedical publications are a rich source of information about diseases and the molecules that play a role in the molecular etiology of a disease. With the development of automated methods for the identification of named biomedical entities in scientific text ("text mining") we are now able to automatically screen millions of publications for genes, their relationships to other genes, their role in the development of a disease and their role as potential targets for therapeutic cures. In fact, modern advanced search engines are now able to extract various terms in scientific text that represent entities which can be directly used for modeling of diseases and simulation of disease-relevant molecular networks.

In my presentation, I will demonstrate how scientific text can be analyzed using a combination of algorithmic approaches (dictionary- and rule-based as well as machine learning - based methods). I will furthermore demonstrate, how scientific information extracted from text can be applied in disease modeling approaches that combine heterogeneous information types (protein-protein-interactions, allelic variants of genes, clinical phenotype information) extracted from scientific publications. I will furthermore show how the analysis of scientific text can be used to construct "knowledge descriptors" that allow a completely new way of predicting the activity of small pharmaceutical molecules.

Taken together, the talk will hopefully provide a clue how far we really are from using text analytics for direct modeling and simulation in the life sciences.



Bielefeld University Library - last update: 11/12/2008