Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.23479 (cs)

[Submitted on 31 Jul 2025]

Title:Seeing More with Less: Video Capsule Endoscopy with Multi-Task Learning

Authors:Julia Werner, Oliver Bause, Julius Oexle, Maxime Le Floch, Franz Brinkmann, Jochen Hampe, Oliver Bringmann

Abstract:Video capsule endoscopy has become increasingly important for investigating the small intestine within the gastrointestinal tract. However, a persistent challenge remains the short battery lifetime of such compact sensor edge devices. Integrating artificial intelligence can help overcome this limitation by enabling intelligent real-time decision- making, thereby reducing the energy consumption and prolonging the battery life. However, this remains challenging due to data sparsity and the limited resources of the device restricting the overall model size. In this work, we introduce a multi-task neural network that combines the functionalities of precise self-localization within the gastrointestinal tract with the ability to detect anomalies in the small intestine within a single model. Throughout the development process, we consistently restricted the total number of parameters to ensure the feasibility to deploy such model in a small capsule. We report the first multi-task results using the recently published Galar dataset, integrating established multi-task methods and Viterbi decoding for subsequent time-series analysis. This outperforms current single-task models and represents a significant ad- vance in AI-based approaches in this field. Our model achieves an accu- racy of 93.63% on the localization task and an accuracy of 87.48% on the anomaly detection task. The approach requires only 1 million parameters while surpassing the current baselines.

Comments:	Accepted at Applications of Medical AI (AMAI workshop) at MICCAI 2025 (submitted version)
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2507.23479 [cs.CV]
	(or arXiv:2507.23479v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.23479

Submission history

From: Julia Werner [view email]
[v1] Thu, 31 Jul 2025 12:00:25 UTC (188 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Seeing More with Less: Video Capsule Endoscopy with Multi-Task Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Seeing More with Less: Video Capsule Endoscopy with Multi-Task Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators