Abstract
Everyone acknowledges the importance of objective scientific information. However, finding and understanding relevant scientific documents is often challenging due to complex terminology and readers’ lack of prior knowledge. The question is can we improve accessibility for everyone? This paper presents an overview of the SimpleText Track at CLEF 2024 addressing the technical and evaluation challenges associated with making scientific information accessible to a wide audience, including students and non-experts. It describes the data and benchmarks provided for scientific text summarization and simplification, along with the participants’ results. The CLEF 2024 SimpleText track is based on four interrelated tasks: Task 1 on Content Selection: Retrieving Passages to Include in a Simplified Summary. Task 2 on Complexity Spotting: Identifying and Explaining Difficult Concepts. Task 3 on Text Simplification: Simplify Scientific Text. Task 4 on SOTA?: Tracking the State-of-the-Art in Scholarly Publications.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
A joined effort with others like Scholarly Document Processing https://sdproc.org/2024/.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
- 8.
Henceforth, we will use ‘difficult term’ to indicate a term that designates a difficult concept.
- 9.
References
Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Ali, S.M., Sajid, H., Aijaz, O., Waheed, O., Alvi, F., Samad, A.: Team sharingans at SimpleText: fine-tuned LLM based approach to scientific text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3174–3181 (2024)
Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.): Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). http://ceur-ws.org/Vol-3497
Bakker, J., Yüksel, G., Kamps, J.: University of Amsterdam at the CLEF 2024 SimpleText track. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3182–3194 (2024)
Capari, A., Azarbonyad, H., Afzal, Z., Tsatsaronis, G.: Enhancing scientific document simplification through adaptive retrieval and generative models. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3206–3229 (2024)
Chung, H.W., et al.: Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25(70), 1–53 (2024)
Di Nunzio, G., Marchesin, S., Silvello, G.: A systematic review of automatic term extraction: what happened in 2022? Digit. Scholarsh. Humanit. 38(Supplement_1), i41–i47 (2023). https://doi.org/10.1093/llc/fqad030. ISSN 2055-7671
Di Nunzio, G.M., Gallina, E., Vezzani, F.: UNIPD@SimpleText2024: a semi-manual approach on prompting ChatGPT for extracting terms and write terminological definitions. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3230–3237 (2024)
D’Souza, J., Kabongo, S., Giglou, H.B., Zhang, Y.: Overview of the CLEF 2024 SimpleText task 4: SOTA? tracking the state-of-the-art in scholarly publications. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3163–3173 (2024)
Elagina, R., Vučić, P.: AI contributions to simplifying scientific discourse in SimpleText 2024. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3238–3245 (2024)
Ermakova, L., Azarbonyad, H., Bertin, S., Augereau, O.: Overview of the CLEF 2023 SimpleText task 2: difficult concept identification and explanation. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-239.pdf
Ermakova, L., et al.: Text simplification for scientific information access. In: Hiemstra, D., Moens, M.-F., Mothe, J., Perego, R., Potthast, M., Sebastiani, F. (eds.) ECIR 2021. LNCS, vol. 12657, pp. 583–592. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-72240-1_68
Ermakova, L., Bertin, S., McCombie, H., Kamps, J.: Overview of the CLEF 2023 SimpleText task 3: scientific text simplification. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-240.pdf
Ermakova, L., Kamps, J.: Complexity-aware scientific literature search: searching for relevant and accessible scientific text. In: Nunzio, G.M.D., Vezzani, F., Ermakova, L., Azarbonyad, H., Kamps, J. (eds.) Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, pp. 16–26, ELRA and ICCL, Torino (2024). https://aclanthology.org/2024.determit-1.2
Ermakova, L., SanJuan, E., Huet, S., Azarbonyad, H., Augereau, O., Kamps, J.: Overview of the CLEF 2023 SimpleText lab: automatic simplification of scientific texts. In: Arampatzis, A., et al. (eds.) CLEF 2023. LNCS, vol. 14163, pp. 482–506. Springer, Cham (2023). https://doi.org/10.1007/978-3-031-42448-9_30
Ermakova, L., et al.: Overview of the CLEF 2022 SimpleText lab: automatic simplification of scientific texts. In: Barrón-Cedeño, A., et al. (eds.) CLEF 2022. LNCS, vol. 13390, pp. 470–494. Springer, Cham (2022). https://doi.org/10.1007/978-3-031-13643-6_28
Ermakova, L., Laimé, V., McCombie, H.: Overview of the CLEF 2024 SimpleText task 3: simplify scientific text. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3147–3162 (2024)
Faggioli, G., Ferro, N., Galuščáková, P., de Herrera, A.G.S. (eds.): Working Notes of CLEF 2024: Conference and Labs of the Evaluation Forum. CEUR Workshop Proceedings. CEUR-WS.org (2024)
Jiang, A.Q., et al.: Mistral 7B. arXiv preprint arXiv:2310.06825 (2023)
Kabongo, S., D’Souza, J., Auer, S.: ORKG-leaderboards: a systematic workflow for mining leaderboards as a knowledge graph. arXiv preprint arXiv:2305.11068 (2023)
Kabongo, S., D’Souza, J., Auer, S.: Zero-shot entailment of leaderboards for empirical AI research. In: Proceedings of the ACM/IEEE Joint Conference on Digital Libraries in 2023 (2023)
Kabongo, S., D’Souza, J., Auer, S.: Exploring the latest LLMs for leaderboard extraction. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3246–3260 (2024)
Kabongo, S., D’Souza, J., Auer, S.: Automated mining of leaderboards for empirical AI research. In: Ke, H.-R., Lee, C.S., Sugiyama, K. (eds.) ICADL 2021. LNCS, vol. 13133, pp. 453–470. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-91669-5_35
Kardas, M., et al.: AxCell: automatic extraction of results from machine learning papers. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 8580–8594 (2020)
Largey, N., Maarefdoust, R., Durgin, S., Mansouri, B.: AIIR lab systems for CLEF 2024 SimpleText: large language models for text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3261–3273 (2024)
Lin, Z.: How to write effective prompts for large language models. Nat. Hum. Behav. 8(4), 611–615 (2024). https://doi.org/10.1038/s41562-024-01847-2, https://www.nature.com/articles/s41562-024-01847-2. ISSN 2397-3374
Mann, R., Mikulandric, T.: CLEF 2024 SimpleText tasks 1-3: use of LLaMA-2 for text simplification. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3274–3283 (2024)
Michail, A., Andermatt, P.S., Fankhauser, T.: UZH pandas at SimpleText2024: multi-prompt minimum bayes risk with diverse prompts. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3284–3287 (2024)
Navigli, R., Velardi, P.: Learning word-class lattices for definition and hypernym extraction. In: ACL, pp. 1318–1327 (2010)
Nunzio, G.M.D., Vezzani, F., Ermakova, L., Azarbonyad, H., Kamps, J. (eds.): Proceedings of the Workshop on DeTermIt! Evaluating Text Difficulty in a Multilingual Context @ LREC-COLING 2024, ELRA and ICCL, Torino, Italia (2024). https://aclanthology.org/2024.determit-1.0
Nunzio, G.M.D., et al.: Overview of the CLEF 2024 SimpleText task 2: identify and explain difficult concepts. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3129–3146 (2024)
OpenAI: GPT-3.5 turbo documentation (2023). https://platform.openai.com/docs/models/gpt-3-5-turbo. Accessed 10 June 2024
Ortiz-Zambrano, J., Espin-Riofrio, C., Montejo-Ráez, A.: SINAI participation in SimpleText task 2 at CLEF 2024: zero-shot prompting on GPT-4-turbo for lexical complexity prediction. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3288–3299 (2024)
SanJuan, E., Huet, S., Kamps, J., Ermakova, L.: Overview of the CLEF 2023 SimpleText task 1: passage selection for a simplified summary. In: Aliannejadi, M., Faggioli, G., Ferro, N., Vlachos, M. (eds.) Working Notes of CLEF 2023: Conference and Labs of the Evaluation Forum, CEUR Workshop Proceedings, vol. 3497. CEUR-WS.org (2023). https://ceur-ws.org/Vol-3497/paper-238.pdf
SanJuan, E., Huet, S., Kamps, J., Ermakova, L.: Overview of the CLEF 2024 SimpleText task 1: retrieve passages to include in a simplified summary. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3115–3128 (2024)
Staudinger, M., El-Ebshihy, A., Ningtyas, A.M., Piroi, F., Hanbury, A.: AMATU@SimpleText2024: are LLMs any good for scientific leaderboard extraction? In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3300–3316 (2024)
Tang, J., Zhang, J., Yao, L., Li, J., Zhang, L., Su, Z.: ArnetMiner: extraction and mining of academic social networks. In: KDD 2008, pp. 990–998 (2008)
Teufel, S., et al.: Argumentative zoning: information extraction from scientific text. Ph.D. thesis, Citeseer (1999)
Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Bartulović, A., Varadi, D.P.: University of split and university of malta (team AB&DPV) at the CLEF 2024 SimpleText track: scientific text made simpler through the use of artificial intelligence. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3195–3205 (2024)
Vendeville, B., Ermakova, L., De Loor, P.: UBO NLP report on the SimpleText track at CLEF 2024. In: Faggioli, G., Ferro, N., Galušcáková, P., de Herrera, A.G.S. (eds.) Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), pp. 3317–3340
Xu, W., Callison-Burch, C., Napoles, C.: Problems in current text simplification research: new data can help. Trans. ACL 3, 283–297 (2015). https://www.mitpressjournals.org/doi/abs/10.1162/tacl_a_00139. ISSN 2307-387X
Acknowledgments
This research was funded, in whole or in part, by the French National Research Agency (ANR) Automatic Simplification of Scientific Texts project (ANR-22-CE23-0019-01) (https://anr.fr/Project-ANR-22-CE23-0019). We also thank the MaDICS research group (https://www.madics.fr/ateliers/simpletext/). The SOTA Task is jointly funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) - project number: NFDI4DataScience (460234259) and the German BMBF project SCINEXT (01lS22070).
This track would not have been possible without the great support of numerous individuals. We want to thank in particular the colleagues and the students who participated in data construction and evaluation. Please visit the SimpleText website for more details on the track (http://simpletext-project.com).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ermakova, L. et al. (2024). Overview of the CLEF 2024 SimpleText Track. In: Goeuriot, L., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2024. Lecture Notes in Computer Science, vol 14959. Springer, Cham. https://doi.org/10.1007/978-3-031-71908-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-031-71908-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-71907-3
Online ISBN: 978-3-031-71908-0
eBook Packages: Computer ScienceComputer Science (R0)