SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation
July 31, 2025)
Abstract
Hyperspectral imaging (HSI) provides rich spectral information for medical imaging, yet encounters significant challenges due to data limitations and hardware variations. We introduce SAMSA, a novel interactive segmentation framework that combines an RGB foundation model with spectral analysis. SAMSA efficiently utilizes user clicks to guide both RGB segmentation and spectral similarity computations. The method addresses key limitations in HSI segmentation through a unique spectral feature fusion strategy that operates independently of spectral band count and resolution. Performance evaluation on publicly available datasets has shown 81.0% 1-click and 93.4% 5-click DICE on a neurosurgical and 81.1% 1-click and 89.2% 5-click DICE on an intraoperative porcine hyperspectral dataset. Experimental results demonstrate SAMSA’s effectiveness in few-shot and zero-shot learning scenarios and using minimal training examples. Our approach enables seamless integration of datasets with different spectral characteristics, providing a flexible framework for hyperspectral medical image analysis.
1 Introduction
Hyperspectral imaging (HSI) offers superior intraoperative guidance through its rich spectral information, allowing precise differentiation between visually similar tissues [3, 17]. The diverse range of HSI hardware, with varying spectral ranges and resolutions, creates significant interoperability challenges that impede data standardization [1]. This technical fragmentation, coupled with HSI’s limited clinical adoption, has resulted in a shortage of comprehensive datasets, presenting a substantial obstacle for machine learning applications [6]. Despite these challenges, recent advances have demonstrated HSI’s potential for intraoperative segmentation in neurosurgery [16, 13, 8] and on porcine organs [15, 19]. However, developing generalized models that account for hardware variations remains unsolved. Classical Spectral Comparison Functions (SCF) such as Spectral Angle (SA) [2] and Pearson’s Correlation Coefficient (PCC) [10] offer highly adaptable approaches for comparing spectra for manual image segmentation. Their untrained nature allows them to generalize to new scenarios without requiring additional data, functioning with any spectral range or number of bands. These methods typically operate by using a reference point (user click) to compare against the rest of the image, making them inherently interactive. However, they face limitations due to the "shading problem" where semantic objects exhibit different spectral signatures, and the challenge of establishing consistent segmentation thresholds within and across images.
Interactive segmentation is particularly valuable in medical imaging, as it leverages expert input to improve performance compared to fully automated methods [22, 20, 21] and enables segmentation of previously unseen tissue classes — a vital capability during surgical procedures where unexpected pathological findings may occur. The shared interactive nature of both classical spectral methods and modern RGB interactive segmentation presents a natural opportunity to combine these approaches, allowing a single user click to serve dual purposes namely, guiding the RGB-based model while simultaneously providing a reference point for spectral comparison. While powerful interactive models like Segment Anything and its successor SAM2 [7, 12] have revolutionized RGB segmentation, these advances cannot be directly applied to HSI due to fundamental differences in data characteristics.
In this work, we propose an interactive image segmentation approach by combining SAM2 with spectral analysis techniques to overcome HSI’s data limitations. Our approach leverages the advantages of large-scale RGB foundation models and integrating HSI’s rich spectral information. Specifically, we contribute: (1) An interactive segmentation framework for HSI, utilizing a dual-input approach that efficiently leverages the same user input (clicks) in two complementary ways: to guide an RGB foundation model and to compute SCF measurements in HSI data, enhancing segmentation performance. (2) We demonstrate effectiveness in both few-shot and zero-shot learning scenarios for tumor classification, showing robust performance even with extremely limited training examples and on unseen test cases. (3) The first HSI machine learning framework that functions independently of HSI band count and wavelength variations, enabling the combination of datasets with different spectral characteristics and semantic classes into a unified training approach.
2 Methodology

Given a hyperspectral image , where and denote the spatial dimensions and represents the number of spectral channels, our goal is to perform interactive foreground/background segmentation based on user-provided click positions. Additionally, we have available a corresponding pseudo RGB image and a ground truth label map with the number of classes. Let be a set of user-provided click positions, where each corresponds to a pixel location in the image. Each model outputs a foreground/background similarity map with representing no similarity and representing strong similarity to the clicked pixel(s).
Spectral Comparison Function (SCF). To model the spectral characteristics, we employ a spectral similarity approach based on click positions. Given multiple clicks in the region of interest, we compute the similarity for each pixel in the image with respect to all selected spectra and assign the highest similarity score. Finally, the SCF outputs a similarity map . SA [2] measures the similarity between two spectra using the angle between them in the spectral space. The spectra can also be compared using PCC [10] which quantifies the linear relationship between the reference spectra and candidate spectra. PCC specifically focuses on modeling negative correlations between spectra distinguishing between positive and negative relationships. While SA and PCC effectively measure similarity between spectral samples, such as those derived from click data, they do not establish decision boundaries since they are not based on learned models. To address this, we employ histogram equalization to maximize the information content of the similarity maps by increasing contrast and improving regional separability [5]. We denote this method as .
RGB Interactive Segmentation. To leverage powerful RGB foundational models for HSI, we first generate pseudo RGB images from the HSI data through spectral band selection and combination [4]. SAM2 [12] is utilized as our RGB segmentation backbone due to its state-of-the-art performance in interactive segmentation tasks and its robust generalization capabilities across diverse imaging domains, including medical [11]. SAM2 generates confidence maps indicating the likelihood of each pixel belonging to the foreground from the pseudo RGB image. denotes the SAM2 model with frozen weights, and denotes the fine-tuned version.
RGB and Spectral Similarity. Both spectral and RGB-based models offer complementary information for image segmentation. While SAM2 processes only RGB information, the combination complements our spectral similarity-based segmentation by capturing spatial and contextual features that may not be evident in pure spectral analysis. To enhance segmentation quality, we explored two initial approaches for fusing the spectral and spatial similarity maps. The first is a simple intersection method where the similarity maps are directly multiplied: . This multiplication produces high values only in regions where both modalities agree, effectively creating a logical AND operation that requires consensus between spectral and spatial information for pixel classification. For a more sophisticated integration, we implement a UNet architecture [14] that takes the similarity maps and as direct inputs to learn optimal fusion strategies: , where represents the trained fusion UNet model. Unlike the deterministic intersection approach, this learnable fusion should uncover complementary spatial relationships between modalities.
SAMSA. To further improve segmentation, we introduce SAMSA, a novel model that fuses spectral similarity with high-resolution spatial features from SAM2. Unlike the aforementioned fusion approaches that combine outputs after segmentation, SAMSA integrates spectral information directly into the upscaling process of the SAM2 mask decoder. A high-level overview of this process is shown in Fig. 1. Given and , SAMSA follows the standard SAM2 processing pipeline and additionally integrates the spectral information. The spectral similarity map is fused with the high-resolution feature maps extracted from SAM2’s encoder, enhancing segmentation decisions based on spectral properties. This allows the model to leverage spectral characteristics that are not visible in pseudo RGB while maintaining SAM2’s spatial precision. We freeze the prompt and image encoders from SAM2, fine-tuning only the lightweight mask decoder. This enables SAMSA to generalize to medical datasets with minimal training data while learning how to effectively combine spatial and spectral information.
3 Experimental Results
For training of the fusion models we mainly follow SAM’s optimization procedure [7]. All models are trained with a combined loss function using DICE and cross-entropy loss with equal weighting, excluding any unlabeled regions. Complete implementation details are provided in the accompanying source code, accessible upon acceptance of this manuscript111REDACTED_CODE_REPOSITORY_LINK.
Datasets. The HiB dataset includes hyperspectral and pseudo RGB images from 34 patients, with patient-wise fold splits [8]. It features four labeled classes: Background, Tumor, Healthy, and Vasculature, plus an Unlabeled category. Following preprocessing as in [9], the dataset consists of 128 spectral bands. The HeiPorSPECTRAL (Heipor) dataset, collected from 20 porcine subjects at Heidelberg University Hospital, provides HSI data with annotations for 20 distinct organs. Spectral information ranges from 500 nm to 1000 nm, and corresponding RGB images are derived from the HSI data [19].
Evaluation Protocol We evaluate each model on foreground/background segmentation using a single user click, following SAM2’s evaluation procedure [7]. For each class, we select a click position at the center of the largest connected component in the foreground region to avoid boundary ambiguity. We report two key metrics: D@0.5 - The DICE score [18] using the standard decision boundary of . D@Max - The max DICE across all thresholds, representing optimal performance without predefined decision boundaries.
We report macro-averaged (Macro) and per-class results. We also evaluate multi-click performance by placing subsequent clicks on the target foreground class. Finally, for trainable models, we conduct N-shot evaluations (1, 3, 5, 10, and 20 examples) to analyze the relationship between training data availability and segmentation quality.
Mod. | Model | Heipor | Hib Dataset | ||||
1 click | 5 clicks | 1 click | 5 clicks | ||||
D@0.5 | D@Max | D@0.5 | D@0.5 | D@Max | D@0.5 | ||
HSI | PCC | 0.122 | 0.472 | 0.117 | |||
SA | 0.117 | 0.489 | 0.117 | ||||
0.205 | 0.487 | 0.137 | |||||
RGB | 0.600 | 0.773 | 0.643 | ||||
0.806 | 0.864 | 0.886 | |||||
Fusion | 0.634 | 0.755 | 0.647 | ||||
0.692 | 0.798 | 0.771 | |||||
SAMSA (ours) | 0.811 | 0.863 | 0.892 |
In our evaluation of spectral similarity functions, SA outperformed PCC with improvements of on Heipor and on Hib datasets when measured by D@Max (table˜1). We further enhanced SA with equalization (), improving contrast around the 0.5 threshold to better align with RGB models, and adopted this as our spectral analysis method for subsequent experiments.
For RGB-only performance (table˜1), demonstrated reasonable generalization to medical domains, achieving Macro D@0.5 on Heipor. However, table˜2 reveals significant weaknesses on the Hib dataset’s Vascular class (), indicating limited generalization to domain-specific medical structures. Fine-tuning substantially improved performance, with achieving on Vascular and on Background classes.
Our analysis of fusion strategies revealed that late fusion approaches namely, and , underperformed compared to , though they improved upon . This suggests spectral information requires earlier integration to enhance segmentation performance, which we implemented in SAMSA.
SAMSA consistently outperformed across all classes on D@0.5, with notable improvements of for Healthy and for Tumor classes. Macro D@0.5 scores increased by for Hib and for Heipor. The modest gains on Heipor can be attributed to its RGB-oriented annotations and predominance of large, centered objects (fig.˜3). These characteristics are particularly favorable for RGB-only models that detect visual boundaries, as evidenced by the strong zero-shot performance of (), which trails the fine-tuned version by only D@0.5. For this reason, we focused our per-class metric analysis on the Hib dataset, where spectral information provides more substantial benefits for segmentation.
As expected, additional clicks improved segmentation performance for all fine-tuned models. SAMSA showed significant improvements with 5-click inputs, increasing performance of D@0.5 by on Heipor and on Hib. In fig.˜2 we demonstrate SAMSA’s superiority over across different click counts on Hib, achieving Macro D@0.5 with 5 clicks. Furthermore, with only 20 training examples, SAMSA achieves Macro D@0.5 for single-click segmentation. Leveraging foundation models, both SAMSA and SAM2 perform well in limited-data scenarios. Notably, the integration of spectral information consistently enhances the training process, with a clear performance gap between SAMSA and emerging at just 5 training examples, highlighting the advantage of spectral information in low-data regimes.

Generalization Results. We conduct a leave-one-class-out experiment on both fine-tuned SAM2 and SAMSA by removing the Tumor class from training while testing across all classes on Hib, simulating real-world scenarios requiring identification of novel structures without prior supervision.
Model | Macro | Background | Healthy | Vascular | Tumor |
---|---|---|---|---|---|
SAMSA(ours) | |||||
0-shot case - excluded Tumor class from train | |||||
SAMSA (ours) |
As seen in table˜2, when the tumor class is excluded from training, performance drops by , falling below even performance for tumor detection. Despite this, its overall Macro performance remains significantly better (). Similarly, SAMSA experiences a performance decrease on tumor class (), but crucially maintains the highest tumor detection capability. Additionally, SAMSA achieves a higher overall Macro result (), suggesting that incorporating spectral information provides meaningful advantages for generalizing to unseen classes.
Training | None | Heipor | Hib | Mixed | |||
---|---|---|---|---|---|---|---|
HSI Channels | - | 100 | 128 | 238 | |||
Num Classes | - | 20 | 4 | 24 | |||
Test | SAM2 | SAMSA | SAMSA | SAMSA | |||
(0-shot) | Tuned | Tuned | Tuned | ||||
Heipor | 0.600 | 0.806 | 0.811 | 0.445 | 0.433 | 0.807 | 0.810 |
Hib | 0.523 | 0.454 | 0.497 | 0.771 | 0.810 | 0.695 | 0.765 |
Secondly, our approach uniquely enables training across datasets with different spectral properties by collapsing spectral information to a single channel regardless of band count or resolution. In table˜3, cross-dataset generalization (training on one dataset, testing on another) performs poorly even below the zero-shot baseline. However, mixed training significantly improves results. While shows inconsistent benefits from mixed training (improved on Heipor, decreased on Hib), SAMSA maintains balanced performance, outperforming on both datasets (Hib , Heipor ). This confirms SAMSA’s ability to generalize across heterogeneous HSI datasets with varying spectral properties and clinical domains.
In fig.˜3 we present qualitative results on the Hib dataset. When clicking on vascular tissue (a), (d) struggles to effectively segment the vascular class without spectral information. The SA map (e) clearly identifies vascular structures but introduces noise around the tumor region. In contrast, SAMSA (f) produces a well-localized probability map for vascular tissue. For the Heipor dataset, clicking on small bowel tissue (g) demonstrates SAMSA’s ability to precisely delineate class boundaries compared to the ground truth (h).

4 Conclusion
SAMSA is a unique method for generalizing across different HSI datasets, enabling effective segmentation in scenarios with limited training data and diverse imaging conditions. The proposed framework’s ability to combine spectral and RGB information provides significant advantages, particularly in detecting challenging medical structures and maintaining performance across different datasets. Our approach shows promise in handling unseen classes and adapting to heterogeneous HSI datasets under low data regimes, opening new possibilities for flexible and robust hyperspectral interactive medical image analysis.
References
- [1] Anichini, G., Leiloglou, M., Hu, Z., O’Neill, K., Daniel Elson: Hyperspectral and multispectral imaging in neurosurgery: a systematic literature review and meta-analysis. European Journal of Surgical Oncology p. 108293 (2024). https://doi.org/https://doi.org/10.1016/j.ejso.2024.108293, https://www.sciencedirect.com/science/article/pii/S0748798324003457
- [2] Boardman, J.: Spectral angle mapping: a rapid measure of spectral similarity. AVIRIS. Delivered by Ingenta (1993)
- [3] Clancy, N.T., Jones, G., Maier-Hein, L., Elson, D.S., Stoyanov, D.: Surgical spectral imaging. Medical Image Analysis 63, 101699 (July 2020). https://doi.org/10.1016/j.media.2020.101699, epub 2020 Apr 13
- [4] Czempiel, T., Roddan, A., Leiloglou, M., Hu, Z., O’Neill, K., Anichini, G., Stoyanov, D., Elson, D.: Rgb to hyperspectral: Spectral reconstruction for enhanced surgical imaging. Healthcare Technology Letters 11(6), 307–317 (2024). https://doi.org/https://doi.org/10.1049/htl2.12098, https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/htl2.12098
- [5] Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Pearson, New York, NY, fourth edition, global edition edn. (2018)
- [6] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016), http://www.deeplearningbook.org, book in preparation for MIT Press
- [7] Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., Dollár, P., Girshick, R.: Segment anything (2023), https://arxiv.org/abs/2304.02643
- [8] Leon, R., Fabelo, H., Ortega, S., Cruz-Guerrero, I., Campos-Delgado, D., Szolna, A., Piñeiro, J., Espino, C., O’Shanahan, A., Hernandez, M., Carrera, D., Bisshopp, S., Sosa, C., Balea-Fernandez, F., Morera, J., Clavo, B., Callico, G.: Hyperspectral imaging benchmark based on machine learning for intraoperative brain tumour detection. NPJ Precision Oncology 7(1), 119 (November 2023). https://doi.org/10.1038/s41698-023-00475-9
- [9] Martinez, B., Leon, R., Fabelo, H., Ortega, S., Piñeiro, J.F., Szolna, A., Hernandez, M., Espino, C., J. O’Shanahan, A., Carrera, D., et al.: Most relevant spectral bands identification for brain cancer detection using hyperspectral imaging. Sensors 19(24), 5481 (2019)
- [10] Meneses, P.R.: Spectral correlation mapper ( scm ) : An improvement on the spectral angle mapper ( sam ) (2000)
- [11] Murali, A., Mascagni, P., Mutter, D., Padoy, N.: Cyclesam: One-shot surgical scene segmentation using cycle-consistent feature matching to prompt sam (2024), https://arxiv.org/abs/2407.06795
- [12] Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., Mintun, E., Pan, J., Alwala, K.V., Carion, N., Wu, C.Y., Girshick, R., Dollár, P., Feichtenhofer, C.: Sam 2: Segment anything in images and videos (2024), https://arxiv.org/abs/2408.00714
- [13] Roddan, A., Czempiel, T., Elson, D.S., Giannarou, S.: Calibration-jitter: Augmentation of hyperspectral data for improved surgical scene segmentation. Healthcare Technology Letters 11(6), 345–354 (2024). https://doi.org/https://doi.org/10.1049/htl2.12102, https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/htl2.12102
- [14] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation (2015), https://arxiv.org/abs/1505.04597
- [15] Seidlitz, S., Sellner, J., Odenthal, J., Özdemir, B., Studier-Fischer, A., Knödler, S., Ayala, L., Adler, T.J., Kenngott, H.G., Tizabi, M., Wagner, M., Nickel, F., Müller-Stich, B.P., Maier-Hein, L.: Robust deep learning-based semantic organ segmentation in hyperspectral images. Medical Image Analysis 80, 102488 (2022). https://doi.org/https://doi.org/10.1016/j.media.2022.102488, https://www.sciencedirect.com/science/article/pii/S1361841522001359
- [16] Shapey, J., Xie, Y., Nabavi, E., Bradford, R., Saeed, S.R., Ourselin, S., Vercauteren, T.: Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies. Journal of Biophotonics 12(9), e201800455 (Sep 2019). https://doi.org/10.1002/jbio.201800455, epub 2019 Apr 29
- [17] Shapey, J., Xie, Y., Nabavi, E., Bradford, R., Saeed, S.R., Ourselin, S., Vercauteren, T.: Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies. Journal of Biophotonics 12(9), e201800455 (2019). https://doi.org/https://doi.org/10.1002/jbio.201800455
- [18] Sørensen, T., Sørensen, T., Biering-Sørensen, T., Sørensen, T., Sorensen, J.T.: A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on danish commons (1948)
- [19] Studier-Fischer, A., Seidlitz, S., Sellner, J., Bressan, M., Özdemir, B., Ayala, L., Odenthal, J., Knoedler, S., Kowalewski, K.F., Haney, C.M., Salg, G., Dietrich, M., Kenngott, H., Gockel, I., Hackert, T., Müller-Stich, B.P., Maier-Hein, L., Nickel, F.: Heiporspectral - the heidelberg porcine hyperspectral imaging dataset of 20 physiological organs. Scientific Data 10(1), 414 (June 2023). https://doi.org/10.1038/s41597-023-02315-8, https://doi.org/10.1038/s41597-023-02315-8
- [20] Wang, G., Li, W., Zuluaga, M.A., et al.: Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Transactions on Medical Imaging 37(7), 1562–1573 (2018). https://doi.org/10.1109/TMI.2018.2791721
- [21] Wang, G., Zuluaga, M.A., Li, W., Pratt, R., Patel, P.A., Aertsen, M., Doel, T., David, A.L., Deprest, J., Ourselin, S., Vercauteren, T.: Deepigeos: A deep interactive geodesic framework for medical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(7), 1559–1572 (Jul 2019). https://doi.org/10.1109/tpami.2018.2840695, http://dx.doi.org/10.1109/TPAMI.2018.2840695
- [22] Zhao, F., Xie, X.: An overview of interactive medical image segmentation. Annals of the BMVA 2013(7), 1–22 (2013)