Skip to main content

TechMiner: Extracting Technologies from Academic Publications

  • Conference paper
  • First Online:
Knowledge Engineering and Knowledge Management (EKAW 2016)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10024))

Included in the following conference series:

  • 2454 Accesses

  • 10 Citations

Abstract

In recent years we have seen the emergence of a variety of scholarly datasets. Typically these capture ‘standard’ scholarly entities and their connections, such as authors, affiliations, venues, publications, citations, and others. However, as the repositories grow and the technology improves, researchers are adding new entities to these repositories to develop a richer model of the scholarly domain. In this paper, we introduce TechMiner, a new approach, which combines NLP, machine learning and semantic technologies, for mining technologies from research publications and generating an OWL ontology describing their relationships with other research entities. The resulting knowledge base can support a number of tasks, such as: richer semantic search, which can exploit the technology dimension to support better retrieval of publications; richer expert search; monitoring the emergence and impact of new technologies, both within and across scientific fields; studying the scholarly dynamics associated with the emergence of new technologies; and others. TechMiner was evaluated on a manually annotated gold standard and the results indicate that it significantly outperforms alternative NLP approaches and that its semantic features improve performance significantly with respect to both recall and precision.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 84.99
Price excludes VAT (USA)
Softcover Book
USD 109.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://ontoware.org/swrc/.

  2. 2.

    http://bibliontology.com.

  3. 3.

    https://www.w3.org/TR/prov-o/.

  4. 4.

    http://www.aktors.org/publications/ontology.

  5. 5.

    https://www.force11.org/group/software-citation-working-group.

  6. 6.

    https://www.elsevier.com/solutions/scopus.

  7. 7.

    The ontologies, the JAPE rules and all the materials used for the evaluation is available at http://technologies.kmi.open.ac.uk/rexplore/ekaw2016/techminer/.

  8. 8.

    http://cui.unige.ch/~deribauh/Ontologies/sciObjCS.owl.

  9. 9.

    http://cui.unige.ch/~deribauh/Ontologies/verbSciOnto.owl.

  10. 10.

    http://cui.unige.ch/~deribauh/Ontologies/scientificObject.owl.

  11. 11.

    https://wordnet.princeton.edu/wordnet/.

  12. 12.

    http://technologies.kmi.open.ac.uk/rexplore/ontologies/BiboExtension.owl.

  13. 13.

    http://www.w3.org/2004/02/skos/.

  14. 14.

    https://gate.ac.uk/.

  15. 15.

    http://cui.unige.ch/~deribauh/Ontologies/TechMiner.owl.

  16. 16.

    www.xmlns.com/foaf/0.1/.

  17. 17.

    http://salt.semanticauthoring.org/ontologies/sro.

  18. 18.

    http://purl.org/spar/frbr.

References

  1. Moller, K., Heath, T., Handschuh, S., Domingue, J.: Recipes for semantic web dog food—the ESWC and ISWC metadata projects. In: 6th International Semantic Web Conference, 11–15 November 2007, Busan, South Korea (2007)

    Google Scholar 

  2. Glaser, H., Millard, I.: Knowledge-enabled research support: RKBExplorer.com. In: Proceedings of Web Science 2009, Athens, Greece (2009)

    Google Scholar 

  3. Dumontier, M., Callahan, A., Cruz-Toledo, J., Ansell, P., Emonet, V., Belleau, F., Droit, A.: Bio2RDF release 3: a larger connected network of linked data for the life sciences. In: 2014 International Semantic Web Conference (Posters & Demos) (2014)

    Google Scholar 

  4. Carpenter, B.: LingPipe for 99.99 % recall of gene mentions. In: Proceedings of the Second BioCreative Challenge Evaluation Workshop, vol. 23, pp. 307–309 (2007)

    Google Scholar 

  5. Corbett, P., Copestake, A.: Cascaded classifiers for confidence-based chemical named entity recognition. BMC Bioinform. 9(11), 1 (2008)

    Google Scholar 

  6. Liakata, M., Teufel, S., Siddharthan, A., Batchelor, C.R.: Corpora for the conceptualisation and zoning of scientific papers. In: LREC (2010)

    Google Scholar 

  7. Groza, T.: Using typed dependencies to study and recognise conceptualisation zones in biomedical literature. PLoS ONE 8(11), e79570 (2013)

    Article  Google Scholar 

  8. de Ribaupierre, H., Falquet, G.: User-centric design and evaluation of a semantic annotation model for scientific documents. In: Proceedings of the 14th International Conference on Knowledge Technologies and Data-driven (2014)

    Google Scholar 

  9. Augenstein, I., Padó, S., Rudolph, S.: LODifier: generating linked data from unstructured text. In: The Semantic Web: Research and Applications, pp. 210–224 (2012)

    Google Scholar 

  10. Usbeck, R., Ngonga Ngomo, A.-C., Röder, M., Gerber, D., Coelho, S.A., Auer, S., Both, A.: AGDISTIS - graph-based disambiguation of named entities using linked data. In: Mika, P. (ed.) ISWC 2014. LNCS, vol. 8796, pp. 457–471. Springer, Heidelberg (2014). doi:10.1007/978-3-319-11964-9_29

    Google Scholar 

  11. Sateli, B., Witte, R.: What’s in this paper? Combining rhetorical entities with linked open data for semantic literature querying. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 1023–1028 (2015)

    Google Scholar 

  12. Bizer, C., Lehmann, J., Kobilarov, G., Auer, S., Becker, C., Cyganiak, R., Hellmann, S.: DBpedia-a crystallization point for the web of data. Web Semant. Sci. Serv. Agents World Wide Web 7(3), 154–165 (2009)

    Article  Google Scholar 

  13. Bandrowski, A., Brush, M., Grethe, J.S., Haendel, M.A., Kennedy, D.N., Hill, S., Hof, P.R., Martone, M.E., Pols, M., Tan, S.C., Washington, N.: The resource identification initiative: a cultural shift in publishing. J. Comparat. Neurol. 524(1), 8–22 (2016)

    Article  Google Scholar 

  14. Scanning Douw, K., Vondeling, H., Eskildsen, D., Simpson, S.: Use of the Internet in scanning the horizon for new and emerging health technologies: a survey of agencies involved in horizon scanning. J. Med. Internet Res. 5(1), e6 (2003)

    Article  Google Scholar 

  15. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    MATH  Google Scholar 

  16. Osborne, F., Motta, E.: Klink-2: integrating multiple web sources to generate semantic topic networks. In: Arenas, M., et al. (eds.) ISWC 2015. LNCS, vol. 9366, pp. 408–424. Springer, Heidelberg (2015). doi:10.1007/978-3-319-25007-6_24

    Chapter  Google Scholar 

  17. de Ribaupierre, H., Falquet, G.:, An automated annotation process for the SciDocAnnot scientific document model. In: Proceedings of the Fifth International Workshop on Semantic Digital Archives, TPDL 2015 (2015)

    Google Scholar 

  18. Osborne, F., Motta, E., Mulholland, P.: Exploring scholarly data with rexplore. In: Alani, H., Kagal, L., Fokoue, A., Groth, P., Biemann, C., Parreira, J.X., Aroyo, L., Noy, N., Welty, C., Janowicz, K. (eds.) ISWC 2013. LNCS, vol. 8218, pp. 460–477. Springer, Heidelberg (2013). doi:10.1007/978-3-642-41335-3_29

    Chapter  Google Scholar 

  19. de Ribaupierre, H., Osborne, F., Motta, E.: Combining NLP and semantics for mining software technologies from research publications. In: Proceedings of the 25th International Conference on World Wide Web (Companion Volume) (2016)

    Google Scholar 

  20. Huang, W.: Do ABCs get more citations than XYZs? Econ. Inq. 53(1), 773–789 (2015)

    Article  Google Scholar 

  21. Mendes, P.N., Jakob, M., García-Silva, A., Bizer, C.: DBpedia spotlight: shedding light on the web of documents. In: Proceedings of the 7th International Conference on Semantic Systems, pp. 1–8. ACM (2011)

    Google Scholar 

  22. Peroni, S., Shotton, D.: FaBiO and CiTO: ontologies for describing bibliographic resources and citations. Web Semant. Sci. Serv. Agents World Wide Web 17, 33–43 (2012)

    Article  Google Scholar 

  23. Ibekwe-SanJuan, F., Fernandez, S., Sanjuan, E., Charton, E.: Annotation of scientific summaries for information retrieval (2011). arXiv preprint arXiv:1110.5722

  24. O’Seaghdha, D., Teufel, S.: Unsupervised learning of rhetorical structure with un-topic models. In: Proceedings of the 25th International Conference on Computational Linguistics (COLING 2014) (2014)

    Google Scholar 

  25. Ronzano, F., Saggion, H.: Dr. inventor framework: extracting structured information from scientific publications. In: Japkowicz, N., Matwin, S. (eds.) DS 2015. LNCS (LNAI), vol. 9356, pp. 209–220. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24282-8_18

    Chapter  Google Scholar 

  26. Bordea, G., Buitelaar, P., Polajnar, T.: Domain-independent term extraction through domain modelling. In: The 10th International Conference on Terminology and Artificial Intelligence (TIA 2013), Paris, France (2013)

    Google Scholar 

Download references

Acknowledgements

We thank Elsevier for providing us with access to the Scopus repository of scholarly data. We also acknowledge grant n° 159047 from the Swiss National Foundation.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Francesco Osborne .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Osborne, F., de Ribaupierre, H., Motta, E. (2016). TechMiner: Extracting Technologies from Academic Publications. In: Blomqvist, E., Ciancarini, P., Poggi, F., Vitali, F. (eds) Knowledge Engineering and Knowledge Management. EKAW 2016. Lecture Notes in Computer Science(), vol 10024. Springer, Cham. https://doi.org/10.1007/978-3-319-49004-5_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-49004-5_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-49003-8

  • Online ISBN: 978-3-319-49004-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics