Overview of the CLEF 2024 SimpleText Track

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14959))

Included in the following conference series:

International Conference of the Cross-Language Evaluation Forum for European Languages

458 Accesses
7 Citations

Abstract

Everyone acknowledges the importance of objective scientific information. However, finding and understanding relevant scientific documents is often challenging due to complex terminology and readers’ lack of prior knowledge. The question is can we improve accessibility for everyone? This paper presents an overview of the SimpleText Track at CLEF 2024 addressing the technical and evaluation challenges associated with making scientific information accessible to a wide audience, including students and non-experts. It describes the data and benchmarks provided for scientific text summarization and simplification, along with the participants’ results. The CLEF 2024 SimpleText track is based on four interrelated tasks: Task 1 on Content Selection: Retrieving Passages to Include in a Simplified Summary. Task 2 on Complexity Spotting: Identifying and Explaining Difficult Concepts. Task 3 on Text Simplification: Simplify Scientific Text. Task 4 on SOTA?: Tracking the State-of-the-Art in Scholarly Publications.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 74.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CLEF 2024 SimpleText Track

CLEF 2023 SimpleText Track

Overview of the CLEF 2022 SimpleText Lab: Automatic Simplification of Scientific Texts

Notes

1.
A joined effort with others like Scholarly Document Processing https://sdproc.org/2024/.
2.
https://www.aminer.cn/citation.
3.
https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2.
4.
https://pyterrier.readthedocs.io/.
5.
https://www.meilisearch.com/.
6.
https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-12-v2.
7.
https://github.com/pgvector/pgvector.
8.
Henceforth, we will use ‘difficult term’ to indicate a term that designates a difficult concept.
9.
https://cran.r-project.org/web/packages/sacRebleu/vignettes/sacReBLEU.html.

References

Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Ali, S.M., Sajid, H., Aijaz, O., Waheed, O., Alvi, F., Samad, A.: Team sharingans at SimpleText: fine-tuned LLM based approach to scientific text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3174–3181 (2024)
Google Scholar
Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.): Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). http://ceur-ws.org/Vol-3497
Bakker, J., Yüksel, G., Kamps, J.: University of Amsterdam at the CLEF 2024 SimpleText track. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3182–3194 (2024)
Google Scholar
Capari, A., Azarbonyad, H., Afzal, Z., Tsatsaronis, G.: Enhancing scientific document simplification through adaptive retrieval and generative models. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3206–3229 (2024)
Google Scholar
Chung, H.W., et al.: Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25(70), 1–53 (2024)
MathSciNet Google Scholar
Di Nunzio, G., Marchesin, S., Silvello, G.: A systematic review of automatic term extraction: what happened in 2022? Digit. Scholarsh. Humanit. 38(Supplement_1), i41–i47 (2023). https://doi.org/10.1093/llc/fqad030. ISSN 2055-7671
Di Nunzio, G.M., Gallina, E., Vezzani, F.: UNIPD@SimpleText2024: a semi-manual approach on prompting ChatGPT for extracting terms and write terminological definitions. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3230–3237 (2024)
Google Scholar
D’Souza, J., Kabongo, S., Giglou, H.B., Zhang, Y.: Overview of the CLEF 2024 SimpleText task 4: SOTA? tracking the state-of-the-art in scholarly publications. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3163–3173 (2024)
Google Scholar
Elagina, R., Vučić, P.: AI contributions to simplifying scientific discourse in SimpleText 2024. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3238–3245 (2024)
Google Scholar
Ermakova, L., Azarbonyad, H., Bertin, S., Augereau, O.: Overview of the CLEF 2023 SimpleText task 2: difficult concept identification and explanation. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-239.pdf
Ermakova, L., et al.: Text simplification for scientific information access. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 583–592. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_68
Chapter Google Scholar
Ermakova, L., Bertin, S., McCombie, H., Kamps, J.: Overview of the CLEF 2023 SimpleText task 3: scientific text simplification. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-240.pdf
Ermakova, L., Kamps, J.: Complexity-aware scientific literature search: searching for relevant and accessible scientific text. In: Nunzio, G.M.D., Vezzani, F., Ermakova, L., Azarbonyad, H., Kamps, J. (eds.) Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, pp. 16–26, ELRA and ICCL, Torino (2024). https://aclanthology.org/2024.determit-1.2
Ermakova, L., SanJuan, E., Huet, S., Azarbonyad, H., Augereau, O., Kamps, J.: Overview of the CLEF 2023 SimpleText lab: automatic simplification of scientific texts. In: Arampatzis, A., et al. (eds.) CLEF 2023. LNCS, vol. 14163, pp. 482–506. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42448-9_30
Chapter Google Scholar
Ermakova, L., et al.: Overview of the CLEF 2022 SimpleText lab: automatic simplification of scientific texts. In: Barrón-Cedeño, A., et al. (eds.) CLEF 2022. LNCS, vol. 13390, pp. 470–494. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_28
Chapter Google Scholar
Ermakova, L., Laimé, V., McCombie, H.: Overview of the CLEF 2024 SimpleText task 3: simplify scientific text. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3147–3162 (2024)
Google Scholar
Faggioli, G., Ferro, N., Galuščáková, P., de Herrera, A.G.S. (eds.): Working Notes of CLEF 2024: Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2024)
Google Scholar
Jiang, A.Q., et al.: Mistral 7B. arXiv preprint arXiv:2310.06825 (2023)
Kabongo, S., D’Souza, J., Auer, S.: ORKG-leaderboards: a systematic workflow for mining leaderboards as a knowledge graph. arXiv preprint arXiv:2305.11068 (2023)
Kabongo, S., D’Souza, J., Auer, S.: Zero-shot entailment of leaderboards for empirical AI research. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2023 (2023)
Google Scholar
Kabongo, S., D’Souza, J., Auer, S.: Exploring the latest LLMs for leaderboard extraction. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3246–3260 (2024)
Google Scholar
Kabongo, S., D’Souza, J., Auer, S.: Automated mining of leaderboards for empirical AI research. In: Ke, H.-R., Lee, C.S., Sugiyama, K. (eds.) ICADL 2021. LNCS, vol. 13133, pp. 453–470. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91669-5_35
Chapter Google Scholar
Kardas, M., et al.: AxCell: automatic extraction of results from machine learning papers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8580–8594 (2020)
Google Scholar
Largey, N., Maarefdoust, R., Durgin, S., Mansouri, B.: AIIR lab systems for CLEF 2024 SimpleText: large language models for text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3261–3273 (2024)
Google Scholar
Lin, Z.: How to write effective prompts for large language models. Nat. Hum. Behav. 8(4), 611–615 (2024). https://doi.org/10.1038/s41562-024-01847-2, https://www.nature.com/articles/s41562-024-01847-2. ISSN 2397-3374
Mann, R., Mikulandric, T.: CLEF 2024 SimpleText tasks 1-3: use of LLaMA-2 for text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3274–3283 (2024)
Google Scholar
Michail, A., Andermatt, P.S., Fankhauser, T.: UZH pandas at SimpleText2024: multi-prompt minimum bayes risk with diverse prompts. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3284–3287 (2024)
Google Scholar
Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: ACL, pp. 1318–1327 (2010)
Google Scholar
Nunzio, G.M.D., Vezzani, F., Ermakova, L., Azarbonyad, H., Kamps, J. (eds.): Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia (2024). https://aclanthology.org/2024.determit-1.0
Nunzio, G.M.D., et al.: Overview of the CLEF 2024 SimpleText task 2: identify and explain difficult concepts. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3129–3146 (2024)
Google Scholar
OpenAI: GPT-3.5 turbo documentation (2023). https://platform.openai.com/docs/models/gpt-3-5-turbo. Accessed 10 June 2024
Ortiz-Zambrano, J., Espin-Riofrio, C., Montejo-Ráez, A.: SINAI participation in SimpleText task 2 at CLEF 2024: zero-shot prompting on GPT-4-turbo for lexical complexity prediction. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3288–3299 (2024)
Google Scholar
SanJuan, E., Huet, S., Kamps, J., Ermakova, L.: Overview of the CLEF 2023 SimpleText task 1: passage selection for a simplified summary. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-238.pdf
SanJuan, E., Huet, S., Kamps, J., Ermakova, L.: Overview of the CLEF 2024 SimpleText task 1: retrieve passages to include in a simplified summary. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3115–3128 (2024)
Google Scholar
Staudinger, M., El-Ebshihy, A., Ningtyas, A.M., Piroi, F., Hanbury, A.: AMATU@SimpleText2024: are LLMs any good for scientific leaderboard extraction? In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3300–3316 (2024)
Google Scholar
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: KDD 2008, pp. 990–998 (2008)
Google Scholar
Teufel, S., et al.: Argumentative zoning: information extraction from scientific text. Ph.D. thesis, Citeseer (1999)
Google Scholar
Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Bartulović, A., Varadi, D.P.: University of split and university of malta (team AB&DPV) at the CLEF 2024 SimpleText track: scientific text made simpler through the use of artificial intelligence. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3195–3205 (2024)
Google Scholar
Vendeville, B., Ermakova, L., De Loor, P.: UBO NLP report on the SimpleText track at CLEF 2024. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3317–3340
Google Scholar
Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. ACL 3, 283–297 (2015). https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00139. ISSN 2307-387X

Download references

Acknowledgments

This research was funded, in whole or in part, by the French National Research Agency (ANR) Automatic Simplification of Scientific Texts project (ANR-22-CE23-0019-01) (https://anr.fr/Project-ANR-22-CE23-0019). We also thank the MaDICS research group (https://www.madics.fr/ateliers/simpletext/). The SOTA Task is jointly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - project number: NFDI4DataScience (460234259) and the German BMBF project SCINEXT (01lS22070).

This track would not have been possible without the great support of numerous individuals. We want to thank in particular the colleagues and the students who participated in data construction and evaluation. Please visit the SimpleText website for more details on the track (http://simpletext-project.com).

Author information

Authors and Affiliations

Université de Bretagne Occidentale, HCTI, Brest, France
Liana Ermakova
Avignon Université, LIA, Avignon, France
Eric SanJuan & Stéphane Huet
Elsevier, Amsterdam, The Netherlands
Hosein Azarbonyad
University of Padua, Padua, Italy
Giorgio Maria Di Nunzio & Federica Vezzani
TIB Leibniz Information Centre for Science and Technology, Hannover, Germany
Jennifer D’Souza
University of Amsterdam, Amsterdam, The Netherlands
Jaap Kamps

Authors

Liana Ermakova
View author publications
Search author on:PubMed Google Scholar
Eric SanJuan
View author publications
Search author on:PubMed Google Scholar
Stéphane Huet
View author publications
Search author on:PubMed Google Scholar
Hosein Azarbonyad
View author publications
Search author on:PubMed Google Scholar
Giorgio Maria Di Nunzio
View author publications
Search author on:PubMed Google Scholar
Federica Vezzani
View author publications
Search author on:PubMed Google Scholar
Jennifer D’Souza
View author publications
Search author on:PubMed Google Scholar
Jaap Kamps
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Jaap Kamps .

Editor information

Editors and Affiliations

Université Grenoble Alpes, CNRS, Grenoble, France
Lorraine Goeuriot
Université Grenoble Alpes, CNRS, Grenoble, France
Philippe Mulhem
Université Grenoble Alpes, CNRS, Grenoble, France
Georges Quénot
Université Grenoble Alpes, CNRS, Grenoble, France
Didier Schwab
University of Padova, Padua, Italy
Giorgio Maria Di Nunzio
Sorbonne University, Paris, France
Laure Soulier
University of Stavanger, Stavanger, Norway
Petra Galuščáková
University of Essex, Colchester, UK
Alba García Seco de Herrera
University of Padova, Padua, Italy
Guglielmo Faggioli
University of Padova, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ermakova, L. et al. (2024). Overview of the CLEF 2024 SimpleText Track. In: Goeuriot, L., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2024. Lecture Notes in Computer Science, vol 14959. Springer, Cham. https://doi.org/10.1007/978-3-031-71908-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-031-71908-0_13
Published: 19 September 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71907-3
Online ISBN: 978-3-031-71908-0
eBook Packages: Computer ScienceComputer Science (R0)

Keywords

Publish with us

Policies and ethics