TechMiner: Extracting Technologies from Academic Publications

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Included in the following conference series:

European Knowledge Acquisition Workshop

2454 Accesses
10 Citations

Abstract

In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Klink-2: Integrating Multiple Web Sources to Generate Semantic Topic Networks

Artificial Intelligence Analysis of Marketing Scientific Literature: An Abstract

Disclosing Citation Meanings for Augmented Research Retrieval and Exploration

Notes

1.
http://ontoware.org/swrc/.
2.
http://bibliontology.com.
3.
https://www.w3.org/TR/prov-o/.
4.
http://www.aktors.org/publications/ontology.
5.
https://www.force11.org/group/software-citation-working-group.
6.
https://www.elsevier.com/solutions/scopus.
7.
The ontologies, the JAPE rules and all the materials used for the evaluation is available at http://technologies.kmi.open.ac.uk/rexplore/ekaw2016/techminer/.
8.
http://cui.unige.ch/~deribauh/Ontologies/sciObjCS.owl.
9.
http://cui.unige.ch/~deribauh/Ontologies/verbSciOnto.owl.
10.
http://cui.unige.ch/~deribauh/Ontologies/scientificObject.owl.
11.
https://wordnet.princeton.edu/wordnet/.
12.
http://technologies.kmi.open.ac.uk/rexplore/ontologies/BiboExtension.owl.
13.
http://www.w3.org/2004/02/skos/.
14.
https://gate.ac.uk/.
15.
http://cui.unige.ch/~deribauh/Ontologies/TechMiner.owl.
16.
www.xmlns.com/foaf/0.1/.
17.
http://salt.semanticauthoring.org/ontologies/sro.
18.
http://purl.org/spar/frbr.

References

Moller, K., Heath, T., Handschuh, S., Domingue, J.: Recipes for semantic web dog food—the ESWC and ISWC metadata projects. In: 6th International Semantic Web Conference, 11–15 November 2007, Busan, South Korea (2007)
Google Scholar
Glaser, H., Millard, I.: Knowledge-enabled research support: RKBExplorer.com. In: Proceedings of Web Science 2009, Athens, Greece (2009)
Google Scholar
Dumontier, M., Callahan, A., Cruz-Toledo, J., Ansell, P., Emonet, V., Belleau, F., Droit, A.: Bio2RDF release 3: a larger connected network of linked data for the life sciences. In: 2014 International Semantic Web Conference (Posters & Demos) (2014)
Google Scholar
Carpenter, B.: LingPipe for 99.99 % recall of gene mentions. In: Proceedings of the Second BioCreative Challenge Evaluation Workshop, vol. 23, pp. 307–309 (2007)
Google Scholar
Corbett, P., Copestake, A.: Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinform. 9(11), 1 (2008)
Google Scholar
Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: LREC (2010)
Google Scholar
Groza, T.: Using typed dependencies to study and recognise conceptualisation zones in biomedical literature. PLoS ONE 8(11), e79570 (2013)
Article Google Scholar
de Ribaupierre, H., Falquet, G.: User-centric design and evaluation of a semantic annotation model for scientific documents. In: Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven (2014)
Google Scholar
Augenstein, I., Padó, S., Rudolph, S.: LODifier: generating linked data from unstructured text. In: The Semantic Web: Research and Applications, pp. 210–224 (2012)
Google Scholar
Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P. (ed.) ISWC 2014. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11964-9_29
Google Scholar
Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 1023–1028 (2015)
Google Scholar
Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia-a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)
Article Google Scholar
Bandrowski, A., Brush, M., Grethe, J.S., Haendel, M.A., Kennedy, D.N., Hill, S., Hof, P.R., Martone, M.E., Pols, M., Tan, S.C., Washington, N.: The resource identification initiative: a cultural shift in publishing. J. Comparat. Neurol. 524(1), 8–22 (2016)
Article Google Scholar
Scanning Douw, K., Vondeling, H., Eskildsen, D., Simpson, S.: Use of the Internet in scanning the horizon for new and emerging health technologies: a survey of agencies involved in horizon scanning. J. Med. Internet Res. 5(1), e6 (2003)
Article Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
MATH Google Scholar
Osborne, F., Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 408–424. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25007-6_24
Chapter Google Scholar
de Ribaupierre, H., Falquet, G.:, An automated annotation process for the SciDocAnnot scientific document model. In: Proceedings of the Fifth International Workshop on Semantic Digital Archives, TPDL 2015 (2015)
Google Scholar
Osborne, F., Motta, E., Mulholland, P.: Exploring scholarly data with rexplore. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 460–477. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41335-3_29
Chapter Google Scholar
de Ribaupierre, H., Osborne, F., Motta, E.: Combining NLP and semantics for mining software technologies from research publications. In: Proceedings of the 25th International Conference on World Wide Web (Companion Volume) (2016)
Google Scholar
Huang, W.: Do ABCs get more citations than XYZs? Econ. Inq. 53(1), 773–789 (2015)
Article Google Scholar
Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)
Google Scholar
Peroni, S., Shotton, D.: FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semant. Sci. Serv. Agents World Wide Web 17, 33–43 (2012)
Article Google Scholar
Ibekwe-SanJuan, F., Fernandez, S., Sanjuan, E., Charton, E.: Annotation of scientific summaries for information retrieval (2011). arXiv preprint arXiv:1110.5722
O’Seaghdha, D., Teufel, S.: Unsupervised learning of rhetorical structure with un-topic models. In: Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014) (2014)
Google Scholar
Ronzano, F., Saggion, H.: Dr. inventor framework: extracting structured information from scientific publications. In: Japkowicz, N., Matwin, S. (eds.) DS 2015. LNCS (LNAI), vol. 9356, pp. 209–220. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24282-8_18
Chapter Google Scholar
Bordea, G., Buitelaar, P., Polajnar, T.: Domain-independent term extraction through domain modelling. In: The 10th International Conference on Terminology and Artificial Intelligence (TIA 2013), Paris, France (2013)
Google Scholar

Download references

Acknowledgements

We thank Elsevier for providing us with access to the Scopus repository of scholarly data. We also acknowledge grant n° 159047 from the Swiss National Foundation.

Author information

Authors and Affiliations

Knowledge Media Institute, The Open University, Milton Keynes, UK
Francesco Osborne, Hélène de Ribaupierre & Enrico Motta
Department of Computer Science, University of Oxford, Oxford, UK
Hélène de Ribaupierre & Enrico Motta

Authors

Francesco Osborne
View author publications
Search author on:PubMed Google Scholar
Hélène de Ribaupierre
View author publications
Search author on:PubMed Google Scholar
Enrico Motta
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Francesco Osborne .

Editor information

Editors and Affiliations

Linköping University, Linköping, Sweden
Eva Blomqvist
University of Bologna, Bologna, Italy
Paolo Ciancarini
University of Bologna, Bologna, Italy
Francesco Poggi
University of Bologna, Bologna, Italy
Fabio Vitali

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Osborne, F., de Ribaupierre, H., Motta, E. (2016). TechMiner: Extracting Technologies from Academic Publications. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_30

Download citation

DOI: https://doi.org/10.1007/978-3-319-49004-5_30
Published: 04 November 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-49003-8
Online ISBN: 978-3-319-49004-5
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics