Skip to main content

Overview of the CLEF 2024 SimpleText Track

Improving Access to Scientific Texts for Everyone

  • Conference paper
  • First Online:
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2024)

Abstract

Everyone acknowledges the importance of objective scientific information. However, finding and understanding relevant scientific documents is often challenging due to complex terminology and readers’ lack of prior knowledge. The question is can we improve accessibility for everyone? This paper presents an overview of the SimpleText Track at CLEF 2024 addressing the technical and evaluation challenges associated with making scientific information accessible to a wide audience, including students and non-experts. It describes the data and benchmarks provided for scientific text summarization and simplification, along with the participants’ results. The CLEF 2024 SimpleText track is based on four interrelated tasks: Task 1 on Content Selection: Retrieving Passages to Include in a Simplified Summary. Task 2 on Complexity Spotting: Identifying and Explaining Difficult Concepts. Task 3 on Text Simplification: Simplify Scientific Text. Task 4 on SOTA?: Tracking the State-of-the-Art in Scholarly Publications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+
from $39.99 /Month
  • Starting from 10 chapters or articles per month
  • Access and download chapters and articles from more than 300k books and 2,500 journals
  • Cancel anytime
View plans

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
eBook
USD 59.99
Price excludes VAT (USA)
Softcover Book
USD 74.99
Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    A joined effort with others like Scholarly Document Processing https://sdproc.org/2024/.

  2. 2.

    https://www.aminer.cn/citation.

  3. 3.

    https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.

  4. 4.

    https://pyterrier.readthedocs.io/.

  5. 5.

    https://www.meilisearch.com/.

  6. 6.

    https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2.

  7. 7.

    https://github.com/pgvector/pgvector.

  8. 8.

    Henceforth, we will use ‘difficult term’ to indicate a term that designates a difficult concept.

  9. 9.

    https://cran.r-project.org/web/packages/sacRebleu/vignettes/sacReBLEU.html.

References

  1. Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  2. Ali, S.M., Sajid, H., Aijaz, O., Waheed, O., Alvi, F., Samad, A.: Team sharingans at SimpleText: fine-tuned LLM based approach to scientific text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3174–3181 (2024)

    Google Scholar 

  3. Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.): Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). http://ceur-ws.org/Vol-3497

  4. Bakker, J., Yüksel, G., Kamps, J.: University of Amsterdam at the CLEF 2024 SimpleText track. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3182–3194 (2024)

    Google Scholar 

  5. Capari, A., Azarbonyad, H., Afzal, Z., Tsatsaronis, G.: Enhancing scientific document simplification through adaptive retrieval and generative models. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3206–3229 (2024)

    Google Scholar 

  6. Chung, H.W., et al.: Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25(70), 1–53 (2024)

    MathSciNet  Google Scholar 

  7. Di Nunzio, G., Marchesin, S., Silvello, G.: A systematic review of automatic term extraction: what happened in 2022? Digit. Scholarsh. Humanit. 38(Supplement_1), i41–i47 (2023). https://doi.org/10.1093/llc/fqad030. ISSN 2055-7671

  8. Di Nunzio, G.M., Gallina, E., Vezzani, F.: UNIPD@SimpleText2024: a semi-manual approach on prompting ChatGPT for extracting terms and write terminological definitions. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3230–3237 (2024)

    Google Scholar 

  9. D’Souza, J., Kabongo, S., Giglou, H.B., Zhang, Y.: Overview of the CLEF 2024 SimpleText task 4: SOTA? tracking the state-of-the-art in scholarly publications. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3163–3173 (2024)

    Google Scholar 

  10. Elagina, R., Vučić, P.: AI contributions to simplifying scientific discourse in SimpleText 2024. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3238–3245 (2024)

    Google Scholar 

  11. Ermakova, L., Azarbonyad, H., Bertin, S., Augereau, O.: Overview of the CLEF 2023 SimpleText task 2: difficult concept identification and explanation. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-239.pdf

  12. Ermakova, L., et al.: Text simplification for scientific information access. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 583–592. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_68

    Chapter  Google Scholar 

  13. Ermakova, L., Bertin, S., McCombie, H., Kamps, J.: Overview of the CLEF 2023 SimpleText task 3: scientific text simplification. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-240.pdf

  14. Ermakova, L., Kamps, J.: Complexity-aware scientific literature search: searching for relevant and accessible scientific text. In: Nunzio, G.M.D., Vezzani, F., Ermakova, L., Azarbonyad, H., Kamps, J. (eds.) Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, pp. 16–26, ELRA and ICCL, Torino (2024). https://aclanthology.org/2024.determit-1.2

  15. Ermakova, L., SanJuan, E., Huet, S., Azarbonyad, H., Augereau, O., Kamps, J.: Overview of the CLEF 2023 SimpleText lab: automatic simplification of scientific texts. In: Arampatzis, A., et al. (eds.) CLEF 2023. LNCS, vol. 14163, pp. 482–506. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42448-9_30

    Chapter  Google Scholar 

  16. Ermakova, L., et al.: Overview of the CLEF 2022 SimpleText lab: automatic simplification of scientific texts. In: Barrón-Cedeño, A., et al. (eds.) CLEF 2022. LNCS, vol. 13390, pp. 470–494. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_28

    Chapter  Google Scholar 

  17. Ermakova, L., Laimé, V., McCombie, H.: Overview of the CLEF 2024 SimpleText task 3: simplify scientific text. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3147–3162 (2024)

    Google Scholar 

  18. Faggioli, G., Ferro, N., Galuščáková, P., de Herrera, A.G.S. (eds.): Working Notes of CLEF 2024: Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2024)

    Google Scholar 

  19. Jiang, A.Q., et al.: Mistral 7B. arXiv preprint arXiv:2310.06825 (2023)

  20. Kabongo, S., D’Souza, J., Auer, S.: ORKG-leaderboards: a systematic workflow for mining leaderboards as a knowledge graph. arXiv preprint arXiv:2305.11068 (2023)

  21. Kabongo, S., D’Souza, J., Auer, S.: Zero-shot entailment of leaderboards for empirical AI research. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2023 (2023)

    Google Scholar 

  22. Kabongo, S., D’Souza, J., Auer, S.: Exploring the latest LLMs for leaderboard extraction. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3246–3260 (2024)

    Google Scholar 

  23. Kabongo, S., D’Souza, J., Auer, S.: Automated mining of leaderboards for empirical AI research. In: Ke, H.-R., Lee, C.S., Sugiyama, K. (eds.) ICADL 2021. LNCS, vol. 13133, pp. 453–470. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91669-5_35

    Chapter  Google Scholar 

  24. Kardas, M., et al.: AxCell: automatic extraction of results from machine learning papers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8580–8594 (2020)

    Google Scholar 

  25. Largey, N., Maarefdoust, R., Durgin, S., Mansouri, B.: AIIR lab systems for CLEF 2024 SimpleText: large language models for text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3261–3273 (2024)

    Google Scholar 

  26. Lin, Z.: How to write effective prompts for large language models. Nat. Hum. Behav. 8(4), 611–615 (2024). https://doi.org/10.1038/s41562-024-01847-2, https://www.nature.com/articles/s41562-024-01847-2. ISSN 2397-3374

  27. Mann, R., Mikulandric, T.: CLEF 2024 SimpleText tasks 1-3: use of LLaMA-2 for text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3274–3283 (2024)

    Google Scholar 

  28. Michail, A., Andermatt, P.S., Fankhauser, T.: UZH pandas at SimpleText2024: multi-prompt minimum bayes risk with diverse prompts. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3284–3287 (2024)

    Google Scholar 

  29. Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: ACL, pp. 1318–1327 (2010)

    Google Scholar 

  30. Nunzio, G.M.D., Vezzani, F., Ermakova, L., Azarbonyad, H., Kamps, J. (eds.): Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia (2024). https://aclanthology.org/2024.determit-1.0

  31. Nunzio, G.M.D., et al.: Overview of the CLEF 2024 SimpleText task 2: identify and explain difficult concepts. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3129–3146 (2024)

    Google Scholar 

  32. OpenAI: GPT-3.5 turbo documentation (2023). https://platform.openai.com/docs/models/gpt-3-5-turbo. Accessed 10 June 2024

  33. Ortiz-Zambrano, J., Espin-Riofrio, C., Montejo-Ráez, A.: SINAI participation in SimpleText task 2 at CLEF 2024: zero-shot prompting on GPT-4-turbo for lexical complexity prediction. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3288–3299 (2024)

    Google Scholar 

  34. SanJuan, E., Huet, S., Kamps, J., Ermakova, L.: Overview of the CLEF 2023 SimpleText task 1: passage selection for a simplified summary. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-238.pdf

  35. SanJuan, E., Huet, S., Kamps, J., Ermakova, L.: Overview of the CLEF 2024 SimpleText task 1: retrieve passages to include in a simplified summary. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3115–3128 (2024)

    Google Scholar 

  36. Staudinger, M., El-Ebshihy, A., Ningtyas, A.M., Piroi, F., Hanbury, A.: AMATU@SimpleText2024: are LLMs any good for scientific leaderboard extraction? In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3300–3316 (2024)

    Google Scholar 

  37. Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: KDD 2008, pp. 990–998 (2008)

    Google Scholar 

  38. Teufel, S., et al.: Argumentative zoning: information extraction from scientific text. Ph.D. thesis, Citeseer (1999)

    Google Scholar 

  39. Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

  40. Bartulović, A., Varadi, D.P.: University of split and university of malta (team AB&DPV) at the CLEF 2024 SimpleText track: scientific text made simpler through the use of artificial intelligence. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3195–3205 (2024)

    Google Scholar 

  41. Vendeville, B., Ermakova, L., De Loor, P.: UBO NLP report on the SimpleText track at CLEF 2024. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3317–3340

    Google Scholar 

  42. Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. ACL 3, 283–297 (2015). https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00139. ISSN 2307-387X

Download references

Acknowledgments

This research was funded, in whole or in part, by the French National Research Agency (ANR) Automatic Simplification of Scientific Texts project (ANR-22-CE23-0019-01) (https://anr.fr/Project-ANR-22-CE23-0019). We also thank the MaDICS research group (https://www.madics.fr/ateliers/simpletext/). The SOTA Task is jointly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - project number: NFDI4DataScience (460234259) and the German BMBF project SCINEXT (01lS22070).

This track would not have been possible without the great support of numerous individuals. We want to thank in particular the colleagues and the students who participated in data construction and evaluation. Please visit the SimpleText website for more details on the track (http://simpletext-project.com).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jaap Kamps .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Ermakova, L. et al. (2024). Overview of the CLEF 2024 SimpleText Track. In: Goeuriot, L., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2024. Lecture Notes in Computer Science, vol 14959. Springer, Cham. https://doi.org/10.1007/978-3-031-71908-0_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-71908-0_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-71907-3

  • Online ISBN: 978-3-031-71908-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics