An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

George Tsatsaronis¹, Georgios Balikas², Prodromos Malakasiotis³, Ioannis Partalas⁴, Matthias Zschunke⁵, Michael R Alvers⁶, Dirk Weissenborn⁷, Anastasia Krithara⁸, Sergios Petridis⁹, Dimitris Polychronopoulos¹⁰, Yannis Almirantis¹¹, John Pavlopoulos¹², Nicolas Baskiotis¹³, Patrick Gallinari¹⁴, Thierry Artiéres¹⁵, Axel-Cyrille Ngonga Ngomo¹⁶, Norman Heino¹⁷, Eric Gaussier¹⁸, Liliana Barrio-Alvers¹⁹, Michael Schroeder²⁰, Ion Androutsopoulos²¹, Georgios Paliouras²²

Affiliations

¹ Biotechnology Center, TU Dresden, 01307, Tatzberg 47-49, Dresden, Germany. george.tsatsaronis@biotec.tu-dresden.de.
² Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, Paris, 75005, France. georgios.balikas@imag.fr.
³ Athens University of Economics and Business, Patission 76, Athens, 10434, Greece. rulller@gmail.com.
⁴ Université Joseph Fourier, 621 Avenue Centrale, Saint-Martin-d'Héres, 38041, France. ioannis.partalas@imag.fr.
⁵ Transinsight GmbH, Tatzberg 47-49, Dresden, 01307, Germany. mzschunke@transinsight.com.
⁶ Transinsight GmbH, Tatzberg 47-49, Dresden, 01307, Germany. malvers@transinsight.com.
⁷ Biotechnology Center, TU Dresden, 01307, Tatzberg 47-49, Dresden, Germany. dirk.weissenborn@gmail.com.
⁸ NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. akrithara@iit.demokritos.gr.
⁹ NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. eserxio@gmail.com.
¹⁰ NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. dpolychr@gmail.com.
¹¹ NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. yalmir@bio.demokritos.gr.
¹² Athens University of Economics and Business, Patission 76, Athens, 10434, Greece. annis@aueb.gr.
¹³ Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, Paris, 75005, France. nicolas.baskiotis@lip6.fr.
¹⁴ Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, Paris, 75005, France. Patrick.Gallinari@lip6.fr.
¹⁵ Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, Paris, 75005, France. thierry.artieres@lip6.fr.
¹⁶ Universität Leipzig, Augustusplatz 10, 04109, Leipzig, Germany. axel.ngonga@gmail.com.
¹⁷ Universität Leipzig, Augustusplatz 10, 04109, Leipzig, Germany. heino@informatik.uni-leipzig.de.
¹⁸ Université Joseph Fourier, 621 Avenue Centrale, Saint-Martin-d'Héres, 38041, France. eric.gaussier@imag.fr.
¹⁹ Transinsight GmbH, Tatzberg 47-49, Dresden, 01307, Germany. lalvers@transinsight.com.
²⁰ Biotechnology Center, TU Dresden, 01307, Tatzberg 47-49, Dresden, Germany. ms@biotec.tu-dresden.de.
²¹ Athens University of Economics and Business, Patission 76, Athens, 10434, Greece. ion@aueb.gr.
²² NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. paliourg@iit.demokritos.gr.

PMID: 25925131
PMCID: PMC4450488
DOI: 10.1186/s12859-015-0564-6

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

George Tsatsaronis et al. BMC Bioinformatics. 2015.

. 2015 Apr 30:16:138.

doi: 10.1186/s12859-015-0564-6.

Authors

Affiliations

¹ Biotechnology Center, TU Dresden, 01307, Tatzberg 47-49, Dresden, Germany. george.tsatsaronis@biotec.tu-dresden.de.
² Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, Paris, 75005, France. georgios.balikas@imag.fr.
³ Athens University of Economics and Business, Patission 76, Athens, 10434, Greece. rulller@gmail.com.
⁴ Université Joseph Fourier, 621 Avenue Centrale, Saint-Martin-d'Héres, 38041, France. ioannis.partalas@imag.fr.
⁵ Transinsight GmbH, Tatzberg 47-49, Dresden, 01307, Germany. mzschunke@transinsight.com.
⁶ Transinsight GmbH, Tatzberg 47-49, Dresden, 01307, Germany. malvers@transinsight.com.
⁷ Biotechnology Center, TU Dresden, 01307, Tatzberg 47-49, Dresden, Germany. dirk.weissenborn@gmail.com.
⁸ NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. akrithara@iit.demokritos.gr.
⁹ NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. eserxio@gmail.com.
¹⁰ NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. dpolychr@gmail.com.
¹¹ NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. yalmir@bio.demokritos.gr.
¹² Athens University of Economics and Business, Patission 76, Athens, 10434, Greece. annis@aueb.gr.
¹³ Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, Paris, 75005, France. nicolas.baskiotis@lip6.fr.
¹⁴ Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, Paris, 75005, France. Patrick.Gallinari@lip6.fr.
¹⁵ Université Pierre et Marie Curie-Paris 6, 4 Place Jussieu, Paris, 75005, France. thierry.artieres@lip6.fr.
¹⁶ Universität Leipzig, Augustusplatz 10, 04109, Leipzig, Germany. axel.ngonga@gmail.com.
¹⁷ Universität Leipzig, Augustusplatz 10, 04109, Leipzig, Germany. heino@informatik.uni-leipzig.de.
¹⁸ Université Joseph Fourier, 621 Avenue Centrale, Saint-Martin-d'Héres, 38041, France. eric.gaussier@imag.fr.
¹⁹ Transinsight GmbH, Tatzberg 47-49, Dresden, 01307, Germany. lalvers@transinsight.com.
²⁰ Biotechnology Center, TU Dresden, 01307, Tatzberg 47-49, Dresden, Germany. ms@biotec.tu-dresden.de.
²¹ Athens University of Economics and Business, Patission 76, Athens, 10434, Greece. ion@aueb.gr.
²² NCSR Demokritos, Ag. Paraskevi, Athens, 60228, Greece. paliourg@iit.demokritos.gr.

PMID: 25925131
PMCID: PMC4450488
DOI: 10.1186/s12859-015-0564-6

Abstract

Background: This article provides an overview of the first BIOASQ challenge, a competition on large-scale biomedical semantic indexing and question answering (QA), which took place between March and September 2013. BIOASQ assesses the ability of systems to semantically index very large numbers of biomedical scientific articles, and to return concise and user-understandable answers to given natural language questions by combining information from biomedical articles and ontologies.

Results: The 2013 BIOASQ competition comprised two tasks, Task 1a and Task 1b. In Task 1a participants were asked to automatically annotate new PUBMED documents with MESH headings. Twelve teams participated in Task 1a, with a total of 46 system runs submitted, and one of the teams performing consistently better than the MTI indexer used by NLM to suggest MESH headings to curators. Task 1b used benchmark datasets containing 29 development and 282 test English questions, along with gold standard (reference) answers, prepared by a team of biomedical experts from around Europe and participants had to automatically produce answers. Three teams participated in Task 1b, with 11 system runs. The BIOASQ infrastructure, including benchmark datasets, evaluation mechanisms, and the results of the participants and baseline methods, is publicly available.

Conclusions: A publicly available evaluation infrastructure for biomedical semantic indexing and QA has been developed, which includes benchmark datasets, and can be used to evaluate systems that: assign MESH headings to published articles or to English questions; retrieve relevant RDF triples from ontologies, relevant articles and snippets from PUBMED Central; produce "exact" and paragraph-sized "ideal" answers (summaries). The results of the systems that participated in the 2013 BIOASQ competition are promising. In Task 1a one of the systems performed consistently better from the NLM's MTI indexer. In Task 1b the systems received high scores in the manual evaluation of the "ideal" answers; hence, they produced high quality summaries as answers. Overall, BIOASQ helped obtain a unified view of how techniques from text classification, semantic indexing, document and passage retrieval, question answering, and text summarization can be combined to allow biomedical experts to obtain concise, user-understandable answers to questions reflecting their real information needs.

PubMed Disclaimer

Figures

**Figure 1**
Overview of semantic indexing and question answering in the biomedical domain. The BIOASQ challenge focuses in pushing systems towards implementing pipelines that can realize the workflow shown in the figure. Starting with a variety of data sources (lower right corner of the figure), semantic indexing and integration brings the data into a form that can be used to respond effectively to domain specific questions. A semantic QA system associates ontology concepts with each question and uses the semantic index of the data to retrieve the relevant pieces of information. The retrieved information is then turned into a concise user-understandable form, which may be, for example, a ranked list of candidate answers (e.g., in factoid questions, like *“What are the physiological manifestations of disorder Y?”*) or a collection of text snippets, ideally forming a coherent summary (e.g., in *“What is known about the metabolism of drug Z?”*). The figure also illustrates how these steps are mapped to the BIOASQ challenge tasks. With blue, Task 1a is depicted, while red depicts Task 1b.

**Figure 2**
Interesting cases when evaluating hierarchical classifiers: **(a)** over-specialization, **(b)** under-specialization, **(c)** alternative problems, **(d)** pairing problem, **(e)** long distance problem. Nodes surrounded by circles are the true classes while the nodes surrounded by rectangles are the predicted classes. LCaF ia based on the notion of adding all ancestors of the predicted (rectangles) and true (circles) classes. However, adding all the ancestors has the undesirable effect of over-penalizing errors that happen to nodes with many ancestors. Thus, LCaF uses the notion of the Lowest Common Ancestor to limit the addition of ancestors.

**Figure 3**
An illustration for the article-offset pairs. An article-offset pair example. Article 1 has n characters and a golden snippet starting at offset 3 and ending at offset 10.

**Figure 4**
Screenshot of the annotation tool’s search and data selection screen with the section for document results expanded. The search interface accepts a number of keywords that are sent in parallel to each of the GOPUBMED services. Upon retrieval of the last response, results are combined and returned to the frontend. The client creates one request for each of the result domains (concepts, documents, statements). Whenever results are retrieved for a domain, the respective section of the *GUI* is updated immediately. Each search result displays the title of the result.

**Figure 5**
Screenshot of the answer formulation and annotation with document snippets. The process of formulating the answer to the selected question and its annotation with document snippets by the domain expert is shown. The user can either dismiss items that were selected in the previous step, or add snippets (i.e., document fragments) as annotations to the answer.

See this image and copyright information in PMC

References

1. The BioASQ Challenge. http://www.bioasq.org/.
1. Doms A, Schroeder M. GoPubMed: exploring PubMed with the Gene Ontology. Nucleic Acids Res. 2005;33:783–6. doi: 10.1093/nar/gki470. - DOI - PMC - PubMed
1. Silla Jr CN. Freitas AA. A survey of hierarchical classification across different application domains. Data Mining Knowledge Discovery. 2011;22:31–72. doi: 10.1007/s10618-010-0175-9. - DOI
1. Athenikos SJ, Han H. Biomedical question answering: A survey. Computer Methods and Programs in Biomedicine. 2010;99:1–24. doi: 10.1016/j.cmpb.2009.10.003. - DOI - PubMed
1. Mangold C. A survey and classification of semantic search approaches. IJMSO. 2007;2(1):23–34. doi: 10.1504/IJMSO.2007.015073. - DOI

Publication types

Actions

MeSH terms

Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions
Actions

LinkOut - more resources

Full Text Sources
Other Literature Sources
- The Lens - Patent Citations Database
- scite Smart Citations

Save citation to file

Email citation

Add to Collections

Add to My Bibliography

Your saved search

Create a file for external citation management software

Your RSS Feed

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

Affiliations

An overview of the BIOASQ large-scale biomedical semantic indexing and question answering competition

Authors

Affiliations

Abstract

Figures

References

Publication types

MeSH terms

LinkOut - more resources

Full Text Sources

Other Literature Sources