Consistent Point Matching

Halid Ziya Yerebakan Gerardo Hermosillo Valadez Siemens Medical Solutions, Malvern, USA

Abstract

This study demonstrates that incorporating a consistency heuristic into the point-matching algorithm [1] improves robustness in matching anatomical locations across pairs of medical images. We validated our approach on diverse longitudinal internal and public datasets spanning CT and MRI modalities. Notably, it surpasses state-of-the-art results on the Deep Lesion Tracking dataset. Additionally, we show that the method effectively addresses landmark localization. The algorithm operates efficiently on standard CPU hardware and allows configurable trade-offs between speed and robustness. The method enables high-precision navigation between medical images without requiring a machine learning model or training data.

Introduction

With the advancement of digitalization in healthcare, medical imaging data are accumulating at an accelerated pace. Comparisons between previously acquired scans are increasingly valuable but often require redundant navigation to the same anatomical locations in 3D volumes. Registration methods can address this challenge, but they are computationally demanding. Thus, in practice, they are implemented only in limited use cases. As an alternative, less accurate landmark-based registration approaches remain in use, risking the loss of semantic relationships at key points of interest.

Recent literature on machine learning in medical imaging emphasizes voxel-level representation learning for semantic description [2, 3, 4], facilitating identification of corresponding points via coarse-to-fine maximum similarity approaches. Bai et al. [5] demonstrated state-of-the-art results on the Deep Lesion Tracker [4, 6] by applying the consistency heuristic during training and incorporating semantic information. Ongoing work seeks better voxel representations for medical images using self-supervised methods [7].

As an alternative solution to the matching problem, we introduced a “Point Matching” method that avoids full registration and quickly identifies the corresponding locations between multiple medical images [1, 8], without the need for a machine learning model. It uses a multi-resolution sparse sampling descriptor to effectively capture anatomical information, as evidenced by an organ classifier [9]. Point matching is not a full registration technique and lacks the robustness of traditional methods. In this paper, we enhance its robustness by applying the consistency heuristic of Bai et al.’s UAE [5], incorporating consistency across multiple resolution levels within the similarity search. Consistency-based point matching improves robustness in the four datasets we tested. Unlike the UAE method, our algorithm requires no training, operates in approximately two seconds on standard CPU hardware, and allows easy adjustment of speed-accuracy trade-offs.

Methods

Consistent Point Matching uses components of Point Matching as building blocks. We briefly describe these components in the next subsection and show how the method can be made more robust by incorporating a consistency heuristic.

Point Matching

Point matching [1] is a hierarchical descriptor search method designed to efficiently identify the corresponding anatomical points between pairs of images. Its core efficiency arises from generating descriptors through simple memory lookups and enabling rapid scaling of the descriptor to different resolutions dynamically. The multi-resolution strategy in both the search and descriptor definition allows the method to effectively capture fine anatomical details as well as broader global context.

Descriptors are created by sparse sampling of intensity values based on predefined offset grids, specified in millimeter displacements thanks to reference frame meta data in medical images. Instead of resampling images, the sampler function adapts to the individual voxel spacings, avoiding additional memory and computation. An illustrative example of this sparse sampling and subsequent reconstruction of descriptors is shown in Figure 1 on the center slice of the query. In this specific example, 7x7x7 grids with 8, 20, 48, and 128 mm spacings are used.

Refer to caption — ((a)) Sparse sampler function on pre-defined offsets

The sampler can be easily scaled to perform matching at both fine and coarse resolutions. Offsets defined in millimeters can be multiplied by a scaling factor to enable zoom-in functionality and adjust sampling memory indexes. The Point Matching method specifically employs five hierarchical levels. At each level, the method iterates over a grid of candidate locations in the target image and selects the best candidate on the basis of a similarity measure that combines mutual information and cosine similarity. Once the optimal point is found, the algorithm advances to the next level, focusing on a smaller, higher-resolution region. Due to the independent calculation of similarities, the method achieves a high degree of parallelization.

Consistent Point Matching

Consistency, in our context, is defined as the distance from the original query point to its round-trip estimate back on the source image. First, we map the query to the target image and then map that point back to the source image. If it returns exactly to the original query point, the consistency distance is zero. Figure 2 illustrates this process (the yellow dot marks the mapped-back location). Using this metric, the heuristic is that points with lower consistency distances are more likely to be accurate correspondences.

We compute the consistencies for multiple nearby locations and incorporate them into the similarity function. Assuming that nearby points have similar offsets in their corresponding positions, we can estimate the required displacement to find the target point. In our implementation, nearby points are selected as six neighbors within a radius of 1.5 and 0.5 times the step size used at the current point matching level (which starts from 16 mm and goes down to 1 mm). Since each neighbor at each scale level votes for the target location, we do not discard estimates; instead, we retain the best five according to the new similarity measure and apply a mean operation to consolidate them. Note that the mapping occurs twice for each of the 12 neighbor points, in addition to the central query, making the naive application 26 times slower. However, with batch processing, the multi-descriptor curation in the search operation is sped up, improving computational time around 2s. The algorithm is given in Listing 1. In addition to the algorithmic change, we have improved the descriptor definition by adding three orthogonal planes with a resolution of 6 mm using a 7x7 2D grid and an 80 mm 3D grid.

Input: query point

Q

, source image

I

, target image

T

Output: estimated

\mathrm{center}

point in

T

s_{0}\leftarrow 16

for $\ell\leftarrow 1$ to $5$ do

s\leftarrow s_{0}\cdot 2^{-\ell}

;

\displaystyle\mathcal{O}\leftarrow\{(0,0,0),(\pm 1.5s,0,0),(\pm 0.5s,0,0),(0,\pm 1.5s,0),(0,\pm 0.5s,0),(0,0,\pm 1.5s),(0,0,\pm 0.5s)\}

;

for $i\leftarrow 1$ to $|\mathcal{O}|$ do

\mathrm{offset}\leftarrow\mathcal{O}[i]

;

F_{i}\leftarrow\mathrm{PointMatching}(I,\;Q+\mathrm{offset},\;T|center,l)

;

Q^{\prime}_{i}\leftarrow\mathrm{PointMatching}(T,\;F_{i},\;I|Q,l)

;

d_{i}\leftarrow\bigl{\lVert}(Q+\mathrm{offset})-Q^{\prime}_{i}\bigr{\rVert}

;

\hat{F}_{i}\leftarrow F_{i}-\mathrm{offset}

;

w_{i}\leftarrow\exp\bigl{(}-d_{i}/s_{0}\bigr{)}\;\cdot\;\mathrm{sim}\bigl{(}Q+\mathrm{offset},F_{i}\bigr{)}

;

let

\mathcal{S}

be the indices of the top-5

w_{i}

;

\mathrm{center}\leftarrow\dfrac{\sum_{i\in\mathcal{S}}\,\hat{F}_{i}}{5}

;

Algorithm 1 Consistent Point Matching

At the first search level, the search range covers the whole image space. Then, the center of the search is updated at each level, and the search space is reduced to a smaller region, similar to point matching. In the algorithm listing, the function $sim$ stands for similarity, and $F_{i}$ is the found location in the target image. Each point matching search operation is performed for a single level, since the level loop is taken outside. Alternative formulations for incorporating the consistency distance could be considered; however, simple multiplication with a gaussian kernel $\exp\bigl{(}-d_{i}/s_{0}\bigr{)}$ is both effective and practical.

Results

We evaluated our algorithm on longitudinal matching tasks using CT and MR images. Additionally, we assessed carina landmark location estimation on a CT dataset. We compared consistent point matching with point matching. Images were loaded with positive intensity values and clipped to a range of 0 to 4096. No resampling was applied.

Matching Between Longitudinal Studies

In our first experiment, we used the public DeepLesion dataset with deep lesion tracking annotations provided by [4, 10] that includes different body parts in CT modality. This dataset presents two challenges: first, it has a limited number of slices in the z-axis; second, the slice thickness is 5 mm or greater in 49% of cases. Only the test set was used, as no training was required. For comparison, we also included the operating point of UAE [5]. Although it is a supervised machine learning method on annotated data, it represents prior state-of-the-art performance (0.841@10mm). Since the other methods are compared in this prior work we only compared to UAE. Despite the challenges, consistent point matching achieves 0.892@10mm, while point matching achieves 0.855@10mm (without radius thresholds). The FROC at different distance thresholds is shown in Figure 3(a). The speed per match in this dataset was 1.31s for consistent point matching and 0.16s for point matching, respectively.

In the second experiment, we used an internally curated dataset of 348 location pairs of lung lesions on CT images. Similarly, the FROC curve is obtained by varying the distance threshold to measure the sensitivity of the estimated locations, as shown in Figure 3(b). The algorithm improves the precision and robustness of point matching, achieving 0.954 versus 0.931 at 10 mm. Point matching takes 0.24 seconds per match, while consistent point matching takes 2.26 seconds.

Multi-Modal Study

We additionally evaluated our method on an in-house study dataset containing multi-time-point CT and MR modalities. This dataset includes aortic aneurysms, intracranial aneurysms (ICA), enlarged lymph nodes, kidney lesions, meningioma, and pulmonary nodule pathologies with a total of 339 pairs of intra-modality matches. In this study, the annotations were provided by multiple annotators. Radiologists were presented with pairs of images along with a description of a predefined finding in the current studies and were asked to find the corresponding locations in previous studies. We used the median of the available annotations as the ground truth.

The results indicate that our algorithm generalizes to diseases and modalities. Consistent point matching reaches 95.2 percent at 10 mm, with an average execution time of 2.46 seconds. The FROC curve is shown in Figure 4(a).

Speed Precision Trade off

In this experiment, we varied the number of consistency points to evaluate the performance-precision trade-off. We used the Deep Lesion Tracking dataset for this purpose as well. Three points include only $stepsize/2$ for the laterality offsets, while seven points include six neighbors. As can be seen in the table, the most robust results are achieved when 13 points are included. However, even with 3 points, there is a significant gain with only a small performance penalty compared to regular point matching, as shown in Table 1. The mean distance drops more than the median, indicating an improvement in robustness.

Method	Mean Distance (mm)	Median Distance (mm)	Time (s)
Point Matching	5.90	3.56	0.12
Consistent Point Matching (3)	4.82	3.16	0.41
Consistent Point Matching (7)	4.65	3.15	0.67
Consistent Point Matching (13)	4.65	3.05	1.06

Table 1: Mean and median distances, and elapsed time for each method. Lower values are better.

Landmark localization using Atlas Annotation

Lastly, we used consistent point matching for detecting carina landmarks in CT images. We collected 209 annotations on CT scans for testing and then used a single template landmark on a separate atlas image to find the corresponding locations in other images. We evaluated the localization performance using an FROC curve, as shown in Figure 4(b). Quantitatively, point matching achieves 0.933@10mm, whereas consistent point matching achieves 0.985@10mm.

It is certainly easy to scale to multiple templates for increased robustness. However, our experimental results indicate that even with a single template, landmark localization performance approaches the level of supervised landmark detectors. This is due to the consistent structure of the human body.

Discussion

The consequence of our findings leads to questions about the necessity of machine learning approaches for some tasks. It is well known that if a machine learning model is optimized for one region or modality, it is not optimal for other modalities and body regions. In a recent study, this issue was raised for the registration task [11], and it was shown that traditional methods have a better generalization ability when no additional supervision is used.

Consistent Point Matching has more potential than what is presented here, such as getting organ label for a query point. The algorithm could be extended to a full registration algorithm using multiple points. However, brute-force scaling does not yield a practical algorithm. Additionally, the current search operation considers only displacement values. In some body regions, such as in extremities, orientation is also important for finding the best matches. Thus, incorporating six degrees of freedom (three translational and three rotational parameters) in the search operation would increase the chance of finding the correct corresponding point. Further studies could investigate these more challenging anatomical regions.

Conclusion

We have demonstrated the Consistent Point Matching method, which allows the identification of anatomically similar locations between pairs of volumetric medical images. Somewhat surprisingly, without any training or data requirements, Consistent Point Matching achieves state-of-the-art performance at high speed of computation and without the need for accelerator hardware, surpassing meticulously trained machine learning models while being generic across tasks, modalities and body parts. We demonstrated its effectiveness on four different datasets. Further applications may enable additional tasks on medical images.

References

[1] Yerebakan, H. Z., Shinagawa, Y., Ranganath, M., Allen-Raffl, S. & Valadez, G. H. A hierarchical descriptor framework for on-the-fly anatomical location matching between longitudinal studies. In International Conference on Medical Image Computing and Computer-Assisted Intervention MTSAIL & LEAF Workshop, 59–68 (Springer, 2023).
[2] Bai, X. & Xia, Y. Sam++: Enhancing anatomic matching using semantic information and structural inference. \JournalTitlearXiv preprint arXiv:2306.13988 (2023).
[3] Vizitiu, A. et al. Multi-scale self-supervised learning for longitudinal lesion tracking with optional supervision. In International Conference on Medical Image Computing and Computer-Assisted Intervention, 573–582 (Springer, 2023).
[4] Cai, J. et al. Deep lesion tracker: monitoring lesions in 4d longitudinal imaging studies. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 15159–15169 (2021).
[5] Bai, X. et al. Uae: Universal anatomical embedding on multi-modality medical images. \JournalTitlearXiv preprint arXiv:2311.15111 (2023).
[6] Yan, K., Wang, X., Lu, L. & Summers, R. M. Deeplesion: automated mining of large-scale lesion annotations and universal lesion detection with deep learning. \JournalTitleJournal of medical imaging 5, 036501 (2018).
[7] Codella, N. C. et al. Medimageinsight: An open-source embedding model for general domain medical imaging. \JournalTitlearXiv preprint arXiv:2410.06542 (2024).
[8] Weikert, T. et al. Reduction in radiologist interpretation time of serial ct and mr imaging findings with deep learning identification of relevant priors, series and finding locations. \JournalTitleAcademic Radiology 30, 2269–2279 (2023).
[9] Yerebakan, H. Z., Shinagawa, Y. & Valadez, G. H. Real time multi organ classification on computed tomography images. In MICCAI Workshop on Data Engineering in Medical Imaging, 1–10 (Springer, 2024).
[10] Yan, K. et al. Sam: Self-supervised learning of pixel-wise anatomical embeddings in radiological images. \JournalTitleIEEE Transactions on Medical Imaging 41, 2658–2669 (2022).
[11] Jena, R., Sethi, D., Chaudhari, P. & Gee, J. Deep learning in medical image registration: Magic or mirage? \JournalTitleAdvances in Neural Information Processing Systems 37, 108331–108353 (2025).