Computer Science > Computer Vision and Pattern Recognition

arXiv:2507.23357 (cs)

[Submitted on 31 Jul 2025]

Title:IN45023 Neural Network Design Patterns in Computer Vision Seminar Report, Summer 2025

Authors:Radu-Andrei Bourceanu, Neil De La Fuente, Jan Grimm, Andrei Jardan, Andriy Manucharyan, Cornelius Weiss, Roman Pflugfelder

View PDF HTML (experimental)

Abstract:This report analyzes the evolution of key design patterns in computer vision by examining six influential papers. The analy- sis begins with foundational architectures for image recognition. We review ResNet, which introduced residual connections to overcome the vanishing gradient problem and enable effective training of significantly deeper convolutional networks. Subsequently, we examine the Vision Transformer (ViT), which established a new paradigm by applying the Transformer ar- chitecture to sequences of image patches, demonstrating the efficacy of attention-based models for large-scale image recogni- tion. Building on these visual representation backbones, we investigate generative models. Generative Adversarial Networks (GANs) are analyzed for their novel adversarial training process, which challenges a generator against a discriminator to learn complex data distributions. Then, Latent Diffusion Models (LDMs) are covered, which improve upon prior generative methods by performing a sequential denoising process in a perceptually compressed latent space. LDMs achieve high-fidelity synthesis with greater computational efficiency, representing the current state-of-the-art for image generation. Finally, we explore self-supervised learning techniques that reduce dependency on labeled data. DINO is a self-distillation framework in which a student network learns to match the output of a momentum-updated teacher, yielding features with strong k-NN classification performance. We conclude with Masked Autoencoders (MAE), which utilize an asymmetric encoder-decoder design to reconstruct heavily masked inputs, providing a highly scalable and effective method for pre-training large-scale vision models.

Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:2507.23357 [cs.CV]
	(or arXiv:2507.23357v1 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2507.23357

Submission history

From: Roman Pflugfelder [view email]
[v1] Thu, 31 Jul 2025 09:08:11 UTC (6,034 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:IN45023 Neural Network Design Patterns in Computer Vision Seminar Report, Summer 2025

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:IN45023 Neural Network Design Patterns in Computer Vision Seminar Report, Summer 2025

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators