The introduction of high-throughput methods has transformed biology into a data-rich science. Kwledge about biological entities and processes has traditionally been acquired by thousands of scientists through decades of experimentation and analysis. The current abundance of biomedical data is accompanied by the creation and quick dissemination of new information. Much of this information and kwledge, however, is represented only in text form--in the biomedical literature, lab tebooks, Web pages, and other sources. Researchers' need to find relevant information in the vast amounts of text has created a surge of interest in automated text-analysis. In this book, Hagit Shatkay and Mark Craven offer a concise and accessible introduction to key ideas in biomedical text mining. The chapters cover such topics as the relevant sources of biomedical text; text-analysis methods in natural language processing; the tasks of information extraction, information retrieval, and text categorization; and methods for empirically assessing text-mining systems. Finally, the authors describe several applications that recognize entities in text and link them to other entities and data resources, support the curation of structured databases, and make use of text to enable further prediction and discovery.
Hagit Shatkay is Associate Professor in the Department of Computer and Information Sciences and Head of the Computational Biomedicine Lab at the University of Delaware. Mark Craven is Professor in the Department of Biostatistics and Medical Informatics and in the Department of Computer Sciences at the University of Wisconsin.