SAMSA: Segment Anything Model Enhanced with Spectral Angles for Hyperspectral Interactive Medical Image Segmentation

Alfie Roddan1
   Tobias Czempiel1
   Chi Xu1
   Daniel S. Elson1
   Stamatia Giannarou1
( 1The Hamlyn Centre for Robotic Surgery, Imperial College London, UK
July 31, 2025)
Abstract

Hyperspectral imaging (HSI) provides rich spectral information for medical imaging, yet encounters significant challenges due to data limitations and hardware variations. We introduce SAMSA, a novel interactive segmentation framework that combines an RGB foundation model with spectral analysis. SAMSA efficiently utilizes user clicks to guide both RGB segmentation and spectral similarity computations. The method addresses key limitations in HSI segmentation through a unique spectral feature fusion strategy that operates independently of spectral band count and resolution. Performance evaluation on publicly available datasets has shown 81.0% 1-click and 93.4% 5-click DICE on a neurosurgical and 81.1% 1-click and 89.2% 5-click DICE on an intraoperative porcine hyperspectral dataset. Experimental results demonstrate SAMSA’s effectiveness in few-shot and zero-shot learning scenarios and using minimal training examples. Our approach enables seamless integration of datasets with different spectral characteristics, providing a flexible framework for hyperspectral medical image analysis.

1 Introduction

Hyperspectral imaging (HSI) offers superior intraoperative guidance through its rich spectral information, allowing precise differentiation between visually similar tissues [3, 17]. The diverse range of HSI hardware, with varying spectral ranges and resolutions, creates significant interoperability challenges that impede data standardization [1]. This technical fragmentation, coupled with HSI’s limited clinical adoption, has resulted in a shortage of comprehensive datasets, presenting a substantial obstacle for machine learning applications [6]. Despite these challenges, recent advances have demonstrated HSI’s potential for intraoperative segmentation in neurosurgery [16, 13, 8] and on porcine organs [15, 19]. However, developing generalized models that account for hardware variations remains unsolved. Classical Spectral Comparison Functions (SCF) such as Spectral Angle (SA) [2] and Pearson’s Correlation Coefficient (PCC) [10] offer highly adaptable approaches for comparing spectra for manual image segmentation. Their untrained nature allows them to generalize to new scenarios without requiring additional data, functioning with any spectral range or number of bands. These methods typically operate by using a reference point (user click) to compare against the rest of the image, making them inherently interactive. However, they face limitations due to the "shading problem" where semantic objects exhibit different spectral signatures, and the challenge of establishing consistent segmentation thresholds within and across images.

Interactive segmentation is particularly valuable in medical imaging, as it leverages expert input to improve performance compared to fully automated methods [22, 20, 21] and enables segmentation of previously unseen tissue classes — a vital capability during surgical procedures where unexpected pathological findings may occur. The shared interactive nature of both classical spectral methods and modern RGB interactive segmentation presents a natural opportunity to combine these approaches, allowing a single user click to serve dual purposes namely, guiding the RGB-based model while simultaneously providing a reference point for spectral comparison. While powerful interactive models like Segment Anything and its successor SAM2 [7, 12] have revolutionized RGB segmentation, these advances cannot be directly applied to HSI due to fundamental differences in data characteristics.

In this work, we propose an interactive image segmentation approach by combining SAM2 with spectral analysis techniques to overcome HSI’s data limitations. Our approach leverages the advantages of large-scale RGB foundation models and integrating HSI’s rich spectral information. Specifically, we contribute: (1) An interactive segmentation framework for HSI, utilizing a dual-input approach that efficiently leverages the same user input (clicks) in two complementary ways: to guide an RGB foundation model and to compute SCF measurements in HSI data, enhancing segmentation performance. (2) We demonstrate effectiveness in both few-shot and zero-shot learning scenarios for tumor classification, showing robust performance even with extremely limited training examples and on unseen test cases. (3) The first HSI machine learning framework that functions independently of HSI band count and wavelength variations, enabling the combination of datasets with different spectral characteristics and semantic classes into a unified training approach.

2 Methodology

Refer to caption
Figure 1: SAMSA outline - a single click in the pseudo RGB is used to guide both the RGB and spectral branch.

Given a hyperspectral image XH×W×CX\in\mathbb{R}^{H\times W\times C}italic_X ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × italic_C end_POSTSUPERSCRIPT, where HHitalic_H and WWitalic_W denote the spatial dimensions and CCitalic_C represents the number of spectral channels, our goal is to perform interactive foreground/background segmentation based on user-provided click positions. Additionally, we have available a corresponding pseudo RGB image XrgbH×W×3X_{rgb}\in\mathbb{R}^{H\times W\times 3}italic_X start_POSTSUBSCRIPT italic_r italic_g italic_b end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT italic_H × italic_W × 3 end_POSTSUPERSCRIPT and a ground truth label map Y[0,,N]H×WY\in{[0,...,N]}^{H\times W}italic_Y ∈ [ 0 , … , italic_N ] start_POSTSUPERSCRIPT italic_H × italic_W end_POSTSUPERSCRIPT with NNitalic_N the number of classes. Let ={Ii,j}\mathcal{I}=\{I_{i,j}\}caligraphic_I = { italic_I start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT } be a set of user-provided click positions, where each Ii,j=(i,j)I_{i,j}=(i,j)italic_I start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = ( italic_i , italic_j ) corresponds to a pixel location in the image. Each model outputs a foreground/background similarity map Y^[0,1]H×W\hat{Y}\in[0,1]^{H\times W}over^ start_ARG italic_Y end_ARG ∈ [ 0 , 1 ] start_POSTSUPERSCRIPT italic_H × italic_W end_POSTSUPERSCRIPT with 0 representing no similarity and 111 representing strong similarity to the clicked pixel(s).

Spectral Comparison Function (SCF). To model the spectral characteristics, we employ a spectral similarity approach based on click positions. Given multiple clicks \mathcal{I}caligraphic_I in the region of interest, we compute the similarity for each pixel in the image with respect to all selected spectra Si,j=X(Ii,j)S_{i,j}=X(I_{i,j})italic_S start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT = italic_X ( italic_I start_POSTSUBSCRIPT italic_i , italic_j end_POSTSUBSCRIPT ) and assign the highest similarity score. Finally, the SCF outputs a similarity map Y^SCF=SCF(X,)\hat{Y}_{SCF}=SCF(X,\mathcal{I})over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_C italic_F end_POSTSUBSCRIPT = italic_S italic_C italic_F ( italic_X , caligraphic_I ). SA [2] measures the similarity between two spectra using the angle between them in the spectral space. The spectra can also be compared using PCC [10] which quantifies the linear relationship between the reference spectra and candidate spectra. PCC specifically focuses on modeling negative correlations between spectra distinguishing between positive and negative relationships. While SA and PCC effectively measure similarity between spectral samples, such as those derived from click data, they do not establish decision boundaries since they are not based on learned models. To address this, we employ histogram equalization to maximize the information content of the similarity maps by increasing contrast and improving regional separability [5]. We denote this method as SCFEqualizedSCF_{Equalized}italic_S italic_C italic_F start_POSTSUBSCRIPT italic_E italic_q italic_u italic_a italic_l italic_i italic_z italic_e italic_d end_POSTSUBSCRIPT.

RGB Interactive Segmentation. To leverage powerful RGB foundational models for HSI, we first generate pseudo RGB images from the HSI data through spectral band selection and combination [4]. SAM2 [12] is utilized as our RGB segmentation backbone due to its state-of-the-art performance in interactive segmentation tasks and its robust generalization capabilities across diverse imaging domains, including medical [11]. SAM2 generates confidence maps Y^SAM2=SAM2(Xrgb,)\hat{Y}_{SAM2}=\text{SAM2}(X_{rgb},\mathcal{I})over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_A italic_M 2 end_POSTSUBSCRIPT = SAM2 ( italic_X start_POSTSUBSCRIPT italic_r italic_g italic_b end_POSTSUBSCRIPT , caligraphic_I ) indicating the likelihood of each pixel belonging to the foreground from the pseudo RGB image. SAM2Base()\text{SAM2}_{Base}(\cdot)SAM2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT ( ⋅ ) denotes the SAM2 model with frozen weights, and SAM2Tuned()\text{SAM2}_{Tuned}(\cdot)SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT ( ⋅ ) denotes the fine-tuned version.

RGB and Spectral Similarity. Both spectral and RGB-based models offer complementary information for image segmentation. While SAM2 processes only RGB information, the combination complements our spectral similarity-based segmentation by capturing spatial and contextual features that may not be evident in pure spectral analysis. To enhance segmentation quality, we explored two initial approaches for fusing the spectral and spatial similarity maps. The first is a simple intersection method where the similarity maps are directly multiplied: Y^SAM2Intersec.=Y^SAM2Y^SCF\hat{Y}_{\text{SAM2}_{Intersec.}}=\hat{Y}_{SAM2}\cdot\hat{Y}_{SCF}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT SAM2 start_POSTSUBSCRIPT italic_I italic_n italic_t italic_e italic_r italic_s italic_e italic_c . end_POSTSUBSCRIPT end_POSTSUBSCRIPT = over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_A italic_M 2 end_POSTSUBSCRIPT ⋅ over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_C italic_F end_POSTSUBSCRIPT. This multiplication produces high values only in regions where both modalities agree, effectively creating a logical AND operation that requires consensus between spectral and spatial information for pixel classification. For a more sophisticated integration, we implement a UNet architecture [14] that takes the similarity maps Y^SAM2\hat{Y}_{SAM2}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_A italic_M 2 end_POSTSUBSCRIPT and Y^SCF\hat{Y}_{SCF}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_C italic_F end_POSTSUBSCRIPT as direct inputs to learn optimal fusion strategies: Y^SAM2UNet=SAM2UNet(Y^SAM2,Y^SCF)\hat{Y}_{\text{SAM2}_{UNet}}=\text{SAM2}_{UNet}(\hat{Y}_{SAM2},\hat{Y}_{SCF})over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT SAM2 start_POSTSUBSCRIPT italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT end_POSTSUBSCRIPT = SAM2 start_POSTSUBSCRIPT italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT ( over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_A italic_M 2 end_POSTSUBSCRIPT , over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_C italic_F end_POSTSUBSCRIPT ), where SAM2UNet()\text{SAM2}_{UNet}(\cdot)SAM2 start_POSTSUBSCRIPT italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT ( ⋅ ) represents the trained fusion UNet model. Unlike the deterministic intersection approach, this learnable fusion should uncover complementary spatial relationships between modalities.

SAMSA. To further improve segmentation, we introduce SAMSA, a novel model that fuses spectral similarity with high-resolution spatial features from SAM2. Unlike the aforementioned fusion approaches that combine outputs after segmentation, SAMSA integrates spectral information directly into the upscaling process of the SAM2 mask decoder. A high-level overview of this process is shown in Fig. 1. Given XRGBX_{RGB}italic_X start_POSTSUBSCRIPT italic_R italic_G italic_B end_POSTSUBSCRIPT and \mathcal{I}caligraphic_I, SAMSA follows the standard SAM2 processing pipeline and additionally integrates the spectral information. The spectral similarity map Y^SCF\hat{Y}_{SCF}over^ start_ARG italic_Y end_ARG start_POSTSUBSCRIPT italic_S italic_C italic_F end_POSTSUBSCRIPT is fused with the high-resolution feature maps S0S_{0}italic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT extracted from SAM2’s encoder, enhancing segmentation decisions based on spectral properties. This allows the model to leverage spectral characteristics that are not visible in pseudo RGB while maintaining SAM2’s spatial precision. We freeze the prompt and image encoders from SAM2, fine-tuning only the lightweight mask decoder. This enables SAMSA to generalize to medical datasets with minimal training data while learning how to effectively combine spatial and spectral information.

3 Experimental Results

For training of the fusion models we mainly follow SAM’s optimization procedure [7]. All models are trained with a combined loss function using DICE and cross-entropy loss with equal weighting, excluding any unlabeled regions. Complete implementation details are provided in the accompanying source code, accessible upon acceptance of this manuscript111REDACTED_CODE_REPOSITORY_LINK.

Datasets. The HiB dataset includes hyperspectral and pseudo RGB images from 34 patients, with patient-wise fold splits [8]. It features four labeled classes: Background, Tumor, Healthy, and Vasculature, plus an Unlabeled category. Following preprocessing as in [9], the dataset consists of 128 spectral bands. The HeiPorSPECTRAL (Heipor) dataset, collected from 20 porcine subjects at Heidelberg University Hospital, provides HSI data with annotations for 20 distinct organs. Spectral information ranges from 500 nm to 1000 nm, and corresponding RGB images are derived from the HSI data [19].

Evaluation Protocol We evaluate each model on foreground/background segmentation using a single user click, following SAM2’s evaluation procedure [7]. For each class, we select a click position at the center of the largest connected component in the foreground region to avoid boundary ambiguity. We report two key metrics: D@0.5 - The DICE score [18] using the standard decision boundary of 0.50.50.5. D@Max - The max DICE across all thresholds, representing optimal performance without predefined decision boundaries.

We report macro-averaged (Macro) and per-class results. We also evaluate multi-click performance by placing subsequent clicks on the target foreground class. Finally, for trainable models, we conduct N-shot evaluations (1, 3, 5, 10, and 20 examples) to analyze the relationship between training data availability and segmentation quality.

Table 1: Macro Results: Performance of models with varying input modalities (Mod.) and user clicks. Bold indicates peak performance per metric and dataset.
Mod. Model Heipor Hib Dataset
1 click 5 clicks 1 click 5 clicks
D@0.5 D@Max D@0.5 D@0.5 D@Max D@0.5
HSI PCC 0.122 0.472 0.117 0.373±0.0190.373^{\pm 0.019}0.373 start_POSTSUPERSCRIPT ± 0.019 end_POSTSUPERSCRIPT 0.885±0.0340.885^{\pm 0.034}0.885 start_POSTSUPERSCRIPT ± 0.034 end_POSTSUPERSCRIPT 0.375±0.0180.375^{\pm 0.018}0.375 start_POSTSUPERSCRIPT ± 0.018 end_POSTSUPERSCRIPT
SA 0.117 0.489 0.117 0.374±0.0190.374^{\pm 0.019}0.374 start_POSTSUPERSCRIPT ± 0.019 end_POSTSUPERSCRIPT 0.889±0.0350.889^{\pm 0.035}0.889 start_POSTSUPERSCRIPT ± 0.035 end_POSTSUPERSCRIPT 0.374±0.0190.374^{\pm 0.019}0.374 start_POSTSUPERSCRIPT ± 0.019 end_POSTSUPERSCRIPT
SAEqualized\text{SA}_{Equalized}SA start_POSTSUBSCRIPT italic_E italic_q italic_u italic_a italic_l italic_i italic_z italic_e italic_d end_POSTSUBSCRIPT 0.205 0.487 0.137 0.568±0.0380.568^{\pm 0.038}0.568 start_POSTSUPERSCRIPT ± 0.038 end_POSTSUPERSCRIPT 0.885±0.0340.885^{\pm 0.034}0.885 start_POSTSUPERSCRIPT ± 0.034 end_POSTSUPERSCRIPT 0.482±0.0330.482^{\pm 0.033}0.482 start_POSTSUPERSCRIPT ± 0.033 end_POSTSUPERSCRIPT
RGB SAM2Base\text{SAM2}_{Base}SAM2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT 0.600 0.773 0.643 0.523±0.0560.523^{\pm 0.056}0.523 start_POSTSUPERSCRIPT ± 0.056 end_POSTSUPERSCRIPT 0.727±0.0430.727^{\pm 0.043}0.727 start_POSTSUPERSCRIPT ± 0.043 end_POSTSUPERSCRIPT 0.591±0.0690.591^{\pm 0.069}0.591 start_POSTSUPERSCRIPT ± 0.069 end_POSTSUPERSCRIPT
SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT 0.806 0.864 0.886 0.771±0.0590.771^{\pm 0.059}0.771 start_POSTSUPERSCRIPT ± 0.059 end_POSTSUPERSCRIPT 0.905±0.0360.905^{\pm 0.036}0.905 start_POSTSUPERSCRIPT ± 0.036 end_POSTSUPERSCRIPT 0.912±0.0250.912^{\pm 0.025}0.912 start_POSTSUPERSCRIPT ± 0.025 end_POSTSUPERSCRIPT
Fusion SAM2SAIntersec.\text{SAM2SA}_{Intersec.}SAM2SA start_POSTSUBSCRIPT italic_I italic_n italic_t italic_e italic_r italic_s italic_e italic_c . end_POSTSUBSCRIPT 0.634 0.755 0.647 0.605±0.0480.605^{\pm 0.048}0.605 start_POSTSUPERSCRIPT ± 0.048 end_POSTSUPERSCRIPT 0.832±0.0330.832^{\pm 0.033}0.832 start_POSTSUPERSCRIPT ± 0.033 end_POSTSUPERSCRIPT 0.674±0.0830.674^{\pm 0.083}0.674 start_POSTSUPERSCRIPT ± 0.083 end_POSTSUPERSCRIPT
SAM2SAUNet\text{SAM2SA}_{UNet}SAM2SA start_POSTSUBSCRIPT italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT 0.692 0.798 0.771 0.650±0.1150.650^{\pm 0.115}0.650 start_POSTSUPERSCRIPT ± 0.115 end_POSTSUPERSCRIPT 0.778±0.1230.778^{\pm 0.123}0.778 start_POSTSUPERSCRIPT ± 0.123 end_POSTSUPERSCRIPT 0.673±0.0960.673^{\pm 0.096}0.673 start_POSTSUPERSCRIPT ± 0.096 end_POSTSUPERSCRIPT
SAMSA (ours) 0.811 0.863 0.892 0.810±0.050\textbf{0.810}^{\pm 0.050}0.810 start_POSTSUPERSCRIPT ± 0.050 end_POSTSUPERSCRIPT 0.929±0.028\textbf{0.929}^{\pm 0.028}0.929 start_POSTSUPERSCRIPT ± 0.028 end_POSTSUPERSCRIPT 0.934±0.031\textbf{0.934}^{\pm 0.031}0.934 start_POSTSUPERSCRIPT ± 0.031 end_POSTSUPERSCRIPT

In our evaluation of spectral similarity functions, SA outperformed PCC with improvements of +0.017+0.017+ 0.017 on Heipor and +0.004+0.004+ 0.004 on Hib datasets when measured by D@Max (table˜1). We further enhanced SA with equalization (SAEqualizedSA_{Equalized}italic_S italic_A start_POSTSUBSCRIPT italic_E italic_q italic_u italic_a italic_l italic_i italic_z italic_e italic_d end_POSTSUBSCRIPT), improving contrast around the 0.5 threshold to better align with RGB models, and adopted this as our spectral analysis method for subsequent experiments.

For RGB-only performance (table˜1), SAM2Base\text{SAM2}_{Base}SAM2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT demonstrated reasonable generalization to medical domains, achieving 0.6000.6000.600 Macro D@0.5 on Heipor. However, table˜2 reveals significant weaknesses on the Hib dataset’s Vascular class (0.3350.3350.335), indicating limited generalization to domain-specific medical structures. Fine-tuning substantially improved performance, with SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT achieving 0.7570.7570.757 on Vascular and 0.8690.8690.869 on Background classes.

Our analysis of fusion strategies revealed that late fusion approaches namely, SAM2SAIntersec.\text{SAM2SA}_{Intersec.}SAM2SA start_POSTSUBSCRIPT italic_I italic_n italic_t italic_e italic_r italic_s italic_e italic_c . end_POSTSUBSCRIPT and SAM2SAUNet\text{SAM2SA}_{UNet}SAM2SA start_POSTSUBSCRIPT italic_U italic_N italic_e italic_t end_POSTSUBSCRIPT, underperformed compared to SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT, though they improved upon SAM2Base\text{SAM2}_{Base}SAM2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT. This suggests spectral information requires earlier integration to enhance segmentation performance, which we implemented in SAMSA.

SAMSA consistently outperformed SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT across all classes on D@0.5, with notable improvements of +0.056+0.056+ 0.056 for Healthy and +0.06+0.06+ 0.06 for Tumor classes. Macro D@0.5 scores increased by +0.039+0.039+ 0.039 for Hib and +0.005+0.005+ 0.005 for Heipor. The modest gains on Heipor can be attributed to its RGB-oriented annotations and predominance of large, centered objects (fig.˜3). These characteristics are particularly favorable for RGB-only models that detect visual boundaries, as evidenced by the strong zero-shot performance of SAM2Base\text{SAM2}_{Base}SAM2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT (0.7730.7730.773), which trails the fine-tuned version by only 0.091-0.091- 0.091 D@0.5. For this reason, we focused our per-class metric analysis on the Hib dataset, where spectral information provides more substantial benefits for segmentation.

As expected, additional clicks improved segmentation performance for all fine-tuned models. SAMSA showed significant improvements with 5-click inputs, increasing performance of D@0.5 by +0.081+0.081+ 0.081 on Heipor and +0.124+0.124+ 0.124 on Hib. In fig.˜2 we demonstrate SAMSA’s superiority over SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT across different click counts on Hib, achieving 0.950.950.95 Macro D@0.5 with 5 clicks. Furthermore, with only 20 training examples, SAMSA achieves 0.790.790.79 Macro D@0.5 for single-click segmentation. Leveraging foundation models, both SAMSA and SAM2 perform well in limited-data scenarios. Notably, the integration of spectral information consistently enhances the training process, with a clear performance gap between SAMSA and SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT emerging at just 5 training examples, highlighting the advantage of spectral information in low-data regimes.

Refer to caption
Figure 2: Performance analysis on Hib dataset: a) Number of clicks and b) Number of shots in training and correlation to model performance.

Generalization Results. We conduct a leave-one-class-out experiment on both fine-tuned SAM2 and SAMSA by removing the Tumor class from training while testing across all classes on Hib, simulating real-world scenarios requiring identification of novel structures without prior supervision.

Table 2: Class results D@0.5D@0.5italic_D @ 0.5 for Hib dataset using 1 click.
Model Macro Background Healthy Vascular Tumor
SAEqualizedSA_{Equalized}italic_S italic_A start_POSTSUBSCRIPT italic_E italic_q italic_u italic_a italic_l italic_i italic_z italic_e italic_d end_POSTSUBSCRIPT 0.568±0.0380.568^{\pm 0.038}0.568 start_POSTSUPERSCRIPT ± 0.038 end_POSTSUPERSCRIPT 0.613±0.1090.613^{\pm 0.109}0.613 start_POSTSUPERSCRIPT ± 0.109 end_POSTSUPERSCRIPT 0.815±0.0930.815^{\pm 0.093}0.815 start_POSTSUPERSCRIPT ± 0.093 end_POSTSUPERSCRIPT 0.506±0.1300.506^{\pm 0.130}0.506 start_POSTSUPERSCRIPT ± 0.130 end_POSTSUPERSCRIPT 0.339±0.1000.339^{\pm 0.100}0.339 start_POSTSUPERSCRIPT ± 0.100 end_POSTSUPERSCRIPT
SAM2Base\text{SAM2}_{Base}SAM2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT 0.523±0.0560.523^{\pm 0.056}0.523 start_POSTSUPERSCRIPT ± 0.056 end_POSTSUPERSCRIPT 0.552±0.0790.552^{\pm 0.079}0.552 start_POSTSUPERSCRIPT ± 0.079 end_POSTSUPERSCRIPT 0.586±0.0830.586^{\pm 0.083}0.586 start_POSTSUPERSCRIPT ± 0.083 end_POSTSUPERSCRIPT 0.335±0.0950.335^{\pm 0.095}0.335 start_POSTSUPERSCRIPT ± 0.095 end_POSTSUPERSCRIPT 0.619±0.1880.619^{\pm 0.188}0.619 start_POSTSUPERSCRIPT ± 0.188 end_POSTSUPERSCRIPT
SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT 0.771±0.0590.771^{\pm 0.059}0.771 start_POSTSUPERSCRIPT ± 0.059 end_POSTSUPERSCRIPT 0.869±0.0450.869^{\pm 0.045}0.869 start_POSTSUPERSCRIPT ± 0.045 end_POSTSUPERSCRIPT 0.778±0.0810.778^{\pm 0.081}0.778 start_POSTSUPERSCRIPT ± 0.081 end_POSTSUPERSCRIPT 0.757±0.1060.757^{\pm 0.106}0.757 start_POSTSUPERSCRIPT ± 0.106 end_POSTSUPERSCRIPT 0.678±0.0980.678^{\pm 0.098}0.678 start_POSTSUPERSCRIPT ± 0.098 end_POSTSUPERSCRIPT
SAMSA(ours) 0.810±0.050\textbf{0.810}^{\pm 0.050}0.810 start_POSTSUPERSCRIPT ± 0.050 end_POSTSUPERSCRIPT 0.881±0.039\textbf{0.881}^{\pm 0.039}0.881 start_POSTSUPERSCRIPT ± 0.039 end_POSTSUPERSCRIPT 0.834±0.066\textbf{0.834}^{\pm 0.066}0.834 start_POSTSUPERSCRIPT ± 0.066 end_POSTSUPERSCRIPT 0.790±0.117\textbf{0.790}^{\pm 0.117}0.790 start_POSTSUPERSCRIPT ± 0.117 end_POSTSUPERSCRIPT 0.738±0.041\textbf{0.738}^{\pm 0.041}0.738 start_POSTSUPERSCRIPT ± 0.041 end_POSTSUPERSCRIPT
0-shot case - excluded Tumor class from train
SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT 0.708±0.0550.708^{\pm 0.055}0.708 start_POSTSUPERSCRIPT ± 0.055 end_POSTSUPERSCRIPT 0.853±0.0630.853^{\pm 0.063}0.853 start_POSTSUPERSCRIPT ± 0.063 end_POSTSUPERSCRIPT 0.735±0.0800.735^{\pm 0.080}0.735 start_POSTSUPERSCRIPT ± 0.080 end_POSTSUPERSCRIPT 0.704±0.0770.704^{\pm 0.077}0.704 start_POSTSUPERSCRIPT ± 0.077 end_POSTSUPERSCRIPT 0.538±0.1250.538^{\pm 0.125}0.538 start_POSTSUPERSCRIPT ± 0.125 end_POSTSUPERSCRIPT
SAMSA (ours) 0.760±0.053\textbf{0.760}^{\pm 0.053}0.760 start_POSTSUPERSCRIPT ± 0.053 end_POSTSUPERSCRIPT 0.881±0.048\textbf{0.881}^{\pm 0.048}0.881 start_POSTSUPERSCRIPT ± 0.048 end_POSTSUPERSCRIPT 0.821±0.081\textbf{0.821}^{\pm 0.081}0.821 start_POSTSUPERSCRIPT ± 0.081 end_POSTSUPERSCRIPT 0.763±0.095\textbf{0.763}^{\pm 0.095}0.763 start_POSTSUPERSCRIPT ± 0.095 end_POSTSUPERSCRIPT 0.576±0.072\textbf{0.576}^{\pm 0.072}0.576 start_POSTSUPERSCRIPT ± 0.072 end_POSTSUPERSCRIPT

As seen in table˜2, when the tumor class is excluded from training, SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT performance drops by 0.140.140.14, falling below even SAM2Base\text{SAM2}_{Base}SAM2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT performance for tumor detection. Despite this, its overall Macro performance remains significantly better (+0.185+0.185+ 0.185). Similarly, SAMSA experiences a performance decrease on tumor class (0.17-0.17- 0.17), but crucially maintains the highest tumor detection capability. Additionally, SAMSA achieves a higher overall Macro result (+0.052+0.052+ 0.052), suggesting that incorporating spectral information provides meaningful advantages for generalizing to unseen classes.

Table 3: Model performance Macro D@0.5 using cross and mixed training
Training\rightarrow None Heipor Hib Mixed
HSI Channels - 100 128 238
Num Classes - 20 4 24
Test\downarrow SAM2BaseSAM2_{Base}italic_S italic_A italic_M 2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT SAM2 SAMSA SAM2SAM2italic_S italic_A italic_M 2 SAMSA SAM2SAM2italic_S italic_A italic_M 2 SAMSA
(0-shot) Tuned Tuned Tuned
Heipor 0.600 0.806 0.811 0.445 0.433 0.807 0.810
Hib 0.523 0.454 0.497 0.771 0.810 0.695 0.765

Secondly, our approach uniquely enables training across datasets with different spectral properties by collapsing spectral information to a single channel regardless of band count or resolution. In table˜3, cross-dataset generalization (training on one dataset, testing on another) performs poorly even below the zero-shot SAM2Base\text{SAM2}_{Base}SAM2 start_POSTSUBSCRIPT italic_B italic_a italic_s italic_e end_POSTSUBSCRIPT baseline. However, mixed training significantly improves results. While SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT shows inconsistent benefits from mixed training (improved on Heipor, decreased on Hib), SAMSA maintains balanced performance, outperforming SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT on both datasets (Hib +0.07+0.07+ 0.07, Heipor +0.003+0.003+ 0.003). This confirms SAMSA’s ability to generalize across heterogeneous HSI datasets with varying spectral properties and clinical domains.

In fig.˜3 we present qualitative results on the Hib dataset. When clicking on vascular tissue (a), SAM2Tuned\text{SAM2}_{Tuned}SAM2 start_POSTSUBSCRIPT italic_T italic_u italic_n italic_e italic_d end_POSTSUBSCRIPT (d) struggles to effectively segment the vascular class without spectral information. The SA map (e) clearly identifies vascular structures but introduces noise around the tumor region. In contrast, SAMSA (f) produces a well-localized probability map for vascular tissue. For the Heipor dataset, clicking on small bowel tissue (g) demonstrates SAMSA’s ability to precisely delineate class boundaries compared to the ground truth (h).

Refer to caption
Figure 3: Comparison of results. (a) RGB image with vascular click. (b) Corresponding label image, where Tumour is red, Vascular structures are blue, Healthy tissue is green, Background non-tissue structures are black, and Unlabeled regions are white. (c) SAMSA prediction. (d-f) Probability maps from SAM2, SA, and SAMSA. (g) RGB image with a small bowel click. (h) Corresponding label image, where Small Bowel is gray and Background is black. (i) SAMSA prediction.

4 Conclusion

SAMSA is a unique method for generalizing across different HSI datasets, enabling effective segmentation in scenarios with limited training data and diverse imaging conditions. The proposed framework’s ability to combine spectral and RGB information provides significant advantages, particularly in detecting challenging medical structures and maintaining performance across different datasets. Our approach shows promise in handling unseen classes and adapting to heterogeneous HSI datasets under low data regimes, opening new possibilities for flexible and robust hyperspectral interactive medical image analysis.

References

  • [1] Anichini, G., Leiloglou, M., Hu, Z., O’Neill, K., Daniel Elson: Hyperspectral and multispectral imaging in neurosurgery: a systematic literature review and meta-analysis. European Journal of Surgical Oncology p. 108293 (2024). https://doi.org/https://doi.org/10.1016/j.ejso.2024.108293, https://www.sciencedirect.com/science/article/pii/S0748798324003457
  • [2] Boardman, J.: Spectral angle mapping: a rapid measure of spectral similarity. AVIRIS. Delivered by Ingenta (1993)
  • [3] Clancy, N.T., Jones, G., Maier-Hein, L., Elson, D.S., Stoyanov, D.: Surgical spectral imaging. Medical Image Analysis 63, 101699 (July 2020). https://doi.org/10.1016/j.media.2020.101699, epub 2020 Apr 13
  • [4] Czempiel, T., Roddan, A., Leiloglou, M., Hu, Z., O’Neill, K., Anichini, G., Stoyanov, D., Elson, D.: Rgb to hyperspectral: Spectral reconstruction for enhanced surgical imaging. Healthcare Technology Letters 11(6), 307–317 (2024). https://doi.org/https://doi.org/10.1049/htl2.12098, https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/htl2.12098
  • [5] Gonzalez, R.C., Woods, R.E.: Digital Image Processing. Pearson, New York, NY, fourth edition, global edition edn. (2018)
  • [6] Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016), http://www.deeplearningbook.org, book in preparation for MIT Press
  • [7] Kirillov, A., Mintun, E., Ravi, N., Mao, H., Rolland, C., Gustafson, L., Xiao, T., Whitehead, S., Berg, A.C., Lo, W.Y., Dollár, P., Girshick, R.: Segment anything (2023), https://arxiv.org/abs/2304.02643
  • [8] Leon, R., Fabelo, H., Ortega, S., Cruz-Guerrero, I., Campos-Delgado, D., Szolna, A., Piñeiro, J., Espino, C., O’Shanahan, A., Hernandez, M., Carrera, D., Bisshopp, S., Sosa, C., Balea-Fernandez, F., Morera, J., Clavo, B., Callico, G.: Hyperspectral imaging benchmark based on machine learning for intraoperative brain tumour detection. NPJ Precision Oncology 7(1),  119 (November 2023). https://doi.org/10.1038/s41698-023-00475-9
  • [9] Martinez, B., Leon, R., Fabelo, H., Ortega, S., Piñeiro, J.F., Szolna, A., Hernandez, M., Espino, C., J. O’Shanahan, A., Carrera, D., et al.: Most relevant spectral bands identification for brain cancer detection using hyperspectral imaging. Sensors 19(24),  5481 (2019)
  • [10] Meneses, P.R.: Spectral correlation mapper ( scm ) : An improvement on the spectral angle mapper ( sam ) (2000)
  • [11] Murali, A., Mascagni, P., Mutter, D., Padoy, N.: Cyclesam: One-shot surgical scene segmentation using cycle-consistent feature matching to prompt sam (2024), https://arxiv.org/abs/2407.06795
  • [12] Ravi, N., Gabeur, V., Hu, Y.T., Hu, R., Ryali, C., Ma, T., Khedr, H., Rädle, R., Rolland, C., Gustafson, L., Mintun, E., Pan, J., Alwala, K.V., Carion, N., Wu, C.Y., Girshick, R., Dollár, P., Feichtenhofer, C.: Sam 2: Segment anything in images and videos (2024), https://arxiv.org/abs/2408.00714
  • [13] Roddan, A., Czempiel, T., Elson, D.S., Giannarou, S.: Calibration-jitter: Augmentation of hyperspectral data for improved surgical scene segmentation. Healthcare Technology Letters 11(6), 345–354 (2024). https://doi.org/https://doi.org/10.1049/htl2.12102, https://ietresearch.onlinelibrary.wiley.com/doi/abs/10.1049/htl2.12102
  • [14] Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation (2015), https://arxiv.org/abs/1505.04597
  • [15] Seidlitz, S., Sellner, J., Odenthal, J., Özdemir, B., Studier-Fischer, A., Knödler, S., Ayala, L., Adler, T.J., Kenngott, H.G., Tizabi, M., Wagner, M., Nickel, F., Müller-Stich, B.P., Maier-Hein, L.: Robust deep learning-based semantic organ segmentation in hyperspectral images. Medical Image Analysis 80, 102488 (2022). https://doi.org/https://doi.org/10.1016/j.media.2022.102488, https://www.sciencedirect.com/science/article/pii/S1361841522001359
  • [16] Shapey, J., Xie, Y., Nabavi, E., Bradford, R., Saeed, S.R., Ourselin, S., Vercauteren, T.: Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies. Journal of Biophotonics 12(9), e201800455 (Sep 2019). https://doi.org/10.1002/jbio.201800455, epub 2019 Apr 29
  • [17] Shapey, J., Xie, Y., Nabavi, E., Bradford, R., Saeed, S.R., Ourselin, S., Vercauteren, T.: Intraoperative multispectral and hyperspectral label-free imaging: A systematic review of in vivo clinical studies. Journal of Biophotonics 12(9), e201800455 (2019). https://doi.org/https://doi.org/10.1002/jbio.201800455
  • [18] Sørensen, T., Sørensen, T., Biering-Sørensen, T., Sørensen, T., Sorensen, J.T.: A method of establishing group of equal amplitude in plant sociobiology based on similarity of species content and its application to analyses of the vegetation on danish commons (1948)
  • [19] Studier-Fischer, A., Seidlitz, S., Sellner, J., Bressan, M., Özdemir, B., Ayala, L., Odenthal, J., Knoedler, S., Kowalewski, K.F., Haney, C.M., Salg, G., Dietrich, M., Kenngott, H., Gockel, I., Hackert, T., Müller-Stich, B.P., Maier-Hein, L., Nickel, F.: Heiporspectral - the heidelberg porcine hyperspectral imaging dataset of 20 physiological organs. Scientific Data 10(1),  414 (June 2023). https://doi.org/10.1038/s41597-023-02315-8, https://doi.org/10.1038/s41597-023-02315-8
  • [20] Wang, G., Li, W., Zuluaga, M.A., et al.: Interactive medical image segmentation using deep learning with image-specific fine tuning. IEEE Transactions on Medical Imaging 37(7), 1562–1573 (2018). https://doi.org/10.1109/TMI.2018.2791721
  • [21] Wang, G., Zuluaga, M.A., Li, W., Pratt, R., Patel, P.A., Aertsen, M., Doel, T., David, A.L., Deprest, J., Ourselin, S., Vercauteren, T.: Deepigeos: A deep interactive geodesic framework for medical image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 41(7), 1559–1572 (Jul 2019). https://doi.org/10.1109/tpami.2018.2840695, http://dx.doi.org/10.1109/TPAMI.2018.2840695
  • [22] Zhao, F., Xie, X.: An overview of interactive medical image segmentation. Annals of the BMVA 2013(7), 1–22 (2013)