Unlocking the Potential of Transformers with mT5 and Attention Mechanisms in Multilingual Plagiarism Detection

64 Accesses
Explore all metrics

Abstract

Plagiarism is one of the most big issues to deal with in the academic domain due to its effects on the credibility of scientific research. One recognized form of plagiarism involves translating texts from one language into another. Cross-lingual plagiarism is an unethical act that continues to proliferate due to the abundance of online information and the availability of translator tools, especially those based on artificial intelligence. To detect this kind of plagiarism, multilingual transformers offer a promising prospect. In this paper, we propose a new methodology for cross-language plagiarism detection based on multilingual pretrained model mT5 along with a Multi-Head Attention (MHAM) and Attention Mechanisms (AM). The approach includes text preprocessing, embedding using mT5, attention layers, and sigmoid function. To evaluate the efficiency of our approach, we conducted a comparative analysis, demonstrating that our method surpasses other pretrained models in performance, namely XLM-RoBERTa, Multilingual BERT, mBART, and M2M-100. Experiments show that the proposed approach based on mT5 transformer and attention layer achieved the high results for the three language pairs with a plagdet of 98.38% for English-French, 98.03% for English-Spanish, 98.73% for English-German, with a granularity of 1.00 for all language pairs.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+

from $39.99 /Month

Starting from 10 chapters or articles per month
Access and download chapters and articles from more than 300k books and 2,500 journals
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 3

Cross-Language Plagiarism Detection: A Case Study of European Languages Academic Works

Cross-Lingual Plagiarism Detection: Two Are Better Than One

Article 28 July 2023

Plagiarism Detection in Across Less Related Languages (English-Arabic): A Comparative Study

Data Availability

The datasets are accessible and referenced in this article.

References

Potthast M, Barrón-Cedeño A, Stein B, Rosso P. Cross-language plagiarism detection. Lang Resour Eval 2011;45(1):45–62. https://doi.org/10.1007/s10579-009-9114-z
Sutskever I, Vinyals O, Le Q V, sequence to sequence learning with neural networks. Adv Neural Inf Process Syst Curran Assoc, Inc.; 2014.
Vaswani A et al. Attention is all you need, 1 août 2023, arXiv: arXiv:1706.03762.
Xia Y, He T, Tan X, Tian F, He D, Qin T. Tied Transformers: Neural Machine Translation with Shared Encoder and Decoder, Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, no 01, Art. no 01, juill. 2019, https://doi.org/10.1609/aaai.v33i01.33015466
Tabinda Kokab S, Asghar S, Naz S. Transformer-based deep learning models for the sentiment analysis of social media data, Array, vol. 14, p. 100157, juill. 2022, https://doi.org/10.1016/j.array.2022.100157
Ngai H, Park Y, Chen J, Parsapoor M. Transformer-Based Models for Question Answering on COVID19, 16 janvier 2021, arXiv: arXiv:2101.11432.
Moravvej SV, Mousavirad SJ, Moghadam MH, Saadatmand M. An LSTM-based plagiarism detection via attention mechanism and a population-based approach for pre-training parameters with imbalanced classes, in Neural Information Processing, vol. 13110, T. Mantoro, M. Lee, M. A. Ayu, K. W. Wong, et A. N. Hidayanto, Éd., in Lecture Notes in Computer Science, vol. 13110., Cham: Springer International Publishing, 2021, pp. 690–701. https://doi.org/10.1007/978-3-030-92238-2_57
Rosu R, Stoica AS, Popescu PS, Mihaescu MC. NLP based Deep Learning Approach for Plagiarism Detection, IJUSI, vol. 13, no 1, pp. 48–60, 2020, https://doi.org/10.37789/ijusi.2020.13.1.4
Moravvej SV, Mousavirad SJ, Oliva D, Schaefer G, Sobhaninia Z. An improved DE algorithm to optimise the learning process of a BERT-based plagiarism Detection model. 2022, p. 7. https://doi.org/10.1109/CEC55065.2022.9870280
Zhou X, Pappas N, Smith NA. Multilevel Text Alignment with Cross-Document Attention, 2 octobre 2020, arXiv: arXiv:2010.01263.
Wahle JP, Ruas T, Meuschke N, Gipp B. Are neural language models good plagiarists? A benchmark for neural paraphrase detection, in 2021 ACM/IEEE Joint Conference on Digital Libraries (JCDL), sept. 2021, pp. 226–229. https://doi.org/10.1109/JCDL52503.2021.00065
Wahle JP, Ruas T, Foltýnek T, Meuschke N, Gipp B. Identifying Machine-Paraphrased Plagiarism, vol. 13192, 2022, pp. 393–413. https://doi.org/10.1007/978-3-030-96957-8_34
Aliane AA, Aliane H. Evaluating SIAMESE Architecture Neural Models for Arabic Textual Similarity and Plagiarism Detection, in 2020 4th International Symposium on Informatics and its Applications (ISIA), déc. 2020, pp. 1–6. https://doi.org/10.1109/ISIA51297.2020.9416550
Jing Y, Liu Y. A population-based plagiarism detection using DistilBERT-generated word embedding. Int J Adv Comput Sci Appl (IJACSA). 2023;14(8): Art. no 8, 54/30. https://doi.org/10.14569/IJACSA.2023.0140868
Meshram S, Kumar M. Long short-term memory network for learning sentences similarity using deep contextual embeddings. Int J Inform Technol 2021;13:1–9. https://doi.org/10.1007/s41870-021-00686-y
Zubarev DV, Sochenkov IV. Cross-language text alignment for plagiarism detection based on contextual and context-free models.
Chi HVT, Anh DL, Thanh NL, Dinh D. English-Vietnamese cross-lingual paraphrase identification using MT-DNN. Eng Technol Appl Sci Res. 2021;11(5): Art. no 5. https://doi.org/10.48084/etasr.4300
Abdous M, Piroozfar P, Bidgoli BM. PESTS: Persian_English Cross Lingual Corpus for Semantic Textual Similarity
Avetisyan K, Malajyan A, Ghukasyan T, Avetisyan A. A Simple and Effective Method of Cross-Lingual Plagiarism Detection, arXiv.org.
Hourrane O, Benlahmar EH. Graph transformer for cross-lingual plagiarism detection. IAES Int J Artif Intell. 2022;11(3):905–915. https://doi.org/10.11591/ijai.v11.i3.pp905-915
Potthast M, Stein B, Eiselt A, Barrón-Cedeño A, Rosso P. PAN Plagiarism Corpus 2011 (PAN-PC-11). Zenodo, 1 juin 2011. https://doi.org/10.5281/zenodo.3250095
Steinberger R, Pouliquen B, Widiger A, Ignat C, Erjavec T, Tufi D. The JRC-Acquis: a multilingual aligned parallel corpus with 20 + Languages.
Koehn P. Europarl: a parallel corpus for statistical machine translation, in Proceedings of Machine Translation Summit X: Papers, Phuket, Thailand, sept. 2005, pp. 79–86.
Barrón-Cedeño A, España-Bonet C, Boldoba J, Màrquez L. A factory of comparable corpora from Wikipedia, in Proceedings of the Eighth Workshop on Building and Using Comparable Corpora, Beijing, China: Association for Computational Linguistics, juill. 2015, pp. 3–13. https://doi.org/10.18653/v1/W15-3402
Ferrero J, Besacier L, Schwab D, Agnès F. Deep Investigation of Cross-Language Plagiarism Detection Methods, in Proceedings of the 10th Workshop on Building and Using Comparable Corpora, Vancouver, Canada: Association for Computational Linguistics, 2017, pp. 6–15. https://doi.org/10.18653/v1/W17-2502
Xue L et al. mT5: A massively multilingual pre-trained text-to-text transformer, 11 Mars 2021, arxiv: arxiv:2010.11934. https://doi.org/10.48550/arXiv.2010.11934
Conneau A et al. Unsupervised Cross-lingual Representation Learning at Scale, 7 avril 2020, arXiv: arXiv:1911.02116.
Devlin J, Chang M-W, Lee K, Toutanova K. BERT: pre-training of deep bidirectional transformers for language understanding, 24 mai 2019, arXiv: arXiv:1810.04805.
Liu Y et al. Multilingual Denoising Pre-training for Neural Machine Translation, 23 janvier 2020, arXiv: arXiv:2001.08210.
Fan A et al. Beyond English-Centric Multilingual Machine Translation, 21 octobre 2020, arXiv: arXiv:2010.11125.
Niu Z, Zhong G, Yu H. A review on the attention mechanism of deep learning, Neurocomputing, vol. 452, pp. 48–62, sept. 2021, https://doi.org/10.1016/j.neucom.2021.03.091
Potthast M, Stein B, Barrón-Cedeño A, Rosso P. An Evaluation Framework for Plagiarism Detection, in Coling 2010: Posters, C.-R. Huang et D. Jurafsky, Éd., Beijing, China: Coling 2010 Organizing Committee, août 2010, pp. 997–1005.

Download references

Funding

No funding was received.

Author information

Authors and Affiliations

Faculty of Sciences Ben M’Sick, Hassan II University, Casablanca, Morocco
Chaimaa Bouaine, Faouzia Benabbou & Chaimae Zaoui

Authors

Chaimaa Bouaine
View author publications
Search author on:PubMed Google Scholar
Faouzia Benabbou
View author publications
Search author on:PubMed Google Scholar
Chaimae Zaoui
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Chaimaa Bouaine.

Ethics declarations

Conflict of Interest

All authors declare that they have no Confict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Not applicable.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Bouaine, C., Benabbou, F. & Zaoui, C. Unlocking the Potential of Transformers with mT5 and Attention Mechanisms in Multilingual Plagiarism Detection. SN COMPUT. SCI. 6, 849 (2025). https://doi.org/10.1007/s42979-025-04379-2

Download citation

Received: 18 June 2024
Accepted: 07 September 2025
Published: 22 September 2025
DOI: https://doi.org/10.1007/s42979-025-04379-2

Unlocking the Potential of Transformers with mT5 and Attention Mechanisms in Multilingual Plagiarism Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-Language Plagiarism Detection: A Case Study of European Languages Academic Works

Cross-Lingual Plagiarism Detection: Two Are Better Than One

Plagiarism Detection in Across Less Related Languages (English-Arabic): A Comparative Study

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Informed Consent

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Unlocking the Potential of Transformers with mT5 and Attention Mechanisms in Multilingual Plagiarism Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Cross-Language Plagiarism Detection: A Case Study of European Languages Academic Works

Cross-Lingual Plagiarism Detection: Two Are Better Than One

Plagiarism Detection in Across Less Related Languages (English-Arabic): A Comparative Study

Explore related subjects

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Informed Consent

Additional information

Publisher’s Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now