Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.23575 (cs)

[Submitted on 31 Jul 2025]

Title:Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation

Authors:Sobhan Asasi, Mohamed Ilyas Lakhal, Ozge Mercanoglu Sincan, Richard Bowden

Abstract:Sign Language Translation (SLT) is a challenging task that requires bridging the modality gap between visual and linguistic information while capturing subtle variations in hand shapes and movements. To address these challenges, we introduce \textbf{BeyondGloss}, a novel gloss-free SLT framework that leverages the spatio-temporal reasoning capabilities of Video Large Language Models (VideoLLMs). Since existing VideoLLMs struggle to model long videos in detail, we propose a novel approach to generate fine-grained, temporally-aware textual descriptions of hand motion. A contrastive alignment module aligns these descriptions with video features during pre-training, encouraging the model to focus on hand-centric temporal dynamics and distinguish signs more effectively. To further enrich hand-specific representations, we distill fine-grained features from HaMeR. Additionally, we apply a contrastive loss between sign video representations and target language embeddings to reduce the modality gap in pre-training. \textbf{BeyondGloss} achieves state-of-the-art performance on the Phoenix14T and CSL-Daily benchmarks, demonstrating the effectiveness of the proposed framework. We will release the code upon acceptance of the paper.

Comments:	Accepted at BMVC 2025
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2507.23575 [cs.CV]
	(or arXiv:2507.23575v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.23575

Submission history

From: Sobhan Asasi [view email]
[v1] Thu, 31 Jul 2025 14:06:07 UTC (15,187 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Beyond Gloss: A Hand-Centric Framework for Gloss-Free Sign Language Translation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators