SXM (socket)

SXM (Server PCI Express Module)^[1] is a high bandwidth socket solution for connecting Nvidia Compute Accelerators to a system. Each generation of Nvidia Tesla since the P100 models, the DGX computer series, and the HGX board series come with an SXM socket type that provides high bandwidth and power delivery for the GPU daughter cards.^[2] Nvidia offers these combinations as an end-user product e.g. in their models of the DGX system series. Current socket generations are SXM for Pascal based GPUs, SXM2 and SXM3 for Volta based GPUs, SXM4 for Ampere based GPUs, and SXM5 for Hopper based GPUs. These sockets are used for specific models of these accelerators, and offer higher performance per card than PCIe equivalents.^[2] The DGX-1 system was the first to be equipped with SXM-2 sockets and thus was the first to carry the form factor compatible SXM modules with P100 GPUs and later was unveiled to be capable of allowing upgrading to (or being pre-equipped with) SXM2 modules with V100 GPUs.^[3]^[4]

Technical details

SXM boards are typically built with four or eight GPU slots, although some solutions such as the Nvidia DGX-2 connect multiple boards to deliver high performance. While third party solutions for SXM boards exist, most systems integrators such as Supermicro use prebuilt Nvidia HGX boards, which come in four or eight socket configurations.^[5] This solution greatly lowers the cost and difficulty of SXM based GPU servers, and enables compatibility and reliability across all boards of the same generation.^{[citation needed]}

SXM modules on e.g. HGX boards, particularly recent generations, may have NVLink switches to allow faster GPU-to-GPU communication. This further reduces bottlenecks which would normally be imposed by CPU and PCIe limitations.^[2]^[6] The GPUs on the daughter cards use NVLink as their main communication protocol.^{[clarification needed]} For example, a Hopper-based H100 SXM5 based GPU can use up to 900 GB/s of bandwidth across 18 NVLink 4 channels, with each contributing a 50 GB/s of bandwidth;^[7] In contrast, PCIe 5.0 can handle up to 64 GB/s of bandwidth within a x16 slot.^[8] This high bandwidth also means that GPUs can share memory over the NVLink bus, allowing an entire HGX board to present to the host system as a single, massive GPU.^[9]

Power delivery is also handled by the SXM socket, negating the need for external power cables such as those needed in PCIe equivalent cards. This, combined with the horizontal mounting, allows more efficient cooling mechanisms, which in turn allow SXM-based GPUs to operate at a much higher thermal design power (TDP). The Hopper-based H100, for example, can draw up to 700 W solely from the SXM socket.^[10] The lack of cabling also makes assembling and repairing of large systems much easier, and also reduces the number of possible points of failure.^[2]

Starting from P100,^[11]^[12]^[13] to V100,^[14] to A100,^[15] to H100,^[16] to B200^[17]^[18] and to R100;^[19] the comparison of accelerators used in DGX:

General & Architecture

Model	Architecture	Socket	GPU	Fabrication Process	Transistor count (billion)	Die size (mm²)	Launched
P100	Pascal	SXM/SXM2	GP100	TSMC 16FF+	15.3	610	Q2 2016
V100 16GB	Volta	SXM2	GV100	TSMC 12FFN	21.1	815	Q3 2017
V100 32GB	Volta	SXM3	GV100	TSMC 12FFN	21.1	815	Q3 2017
A100 40GB	Ampere	SXM4	GA100	TSMC N7	54.2	826	Q1 2020
A100 80GB	Ampere	SXM4	GA100	TSMC N7	54.2	826	Q4 2020
H100	Hopper	SXM5	GH100	TSMC 4N	80	814	Q3 2022
H200	Hopper	SXM5	GH100	TSMC 4N	80	814	Q3 2023
B100	Blackwell	SXM6	GB100	TSMC 4NP	208	N/A	Q4 2024
B200	Blackwell	SXM6	GB100	TSMC 4NP	208	N/A	Q4 2024
R100	Rubin	SXM7	N/a	TSMC 3N	338	N/a	H2 2026

Cores, Clock & Power

Model	Boost clock (MHz)	#SM	Cores (FP32 CUDA)	Cores (FP64 excl. tensor)	Cores (Mixed INT32/FP32)	Cores (INT32)	TDP (W)
P100	1480	56	3584	1792	N/a	N/a	300
V100 16GB	1530	80	5120	2560	N/A	5120	300
V100 32GB	1530	80	5120	2560	N/A	5120	350
A100 40GB	1410	108	6912	3456	6912	N/A	400
A100 80GB	1410	108	6912	3456	6912	N/A	400
H100	1980	132	16896	4608	16896	N/A	700
H200	1980	132	16896	4608	16896	N/A	1000
B100	N/a	N/a	N/a	N/a	N/a	N/a	700
B200	N/a	N/a	N/a	N/a	N/a	N/a	1000
R100	N/a	N/a	N/a	N/a	N/a	N/a	2300

Memory & Cache

Model	Memory Type (HBM)	VRAM Size (GB)	Memory Speed (Gb/s)	Bus width (bits)	Bandwidth (TB/s)	L1 Cache Per SM (KB)	L1 Cache Total (KB)	L2 Cache (KB)
P100	HBM2	16	1.4	4096	0.72	24	1344	4096
V100 16GB	HBM2	16	1.75	4096	0.9	128	10240	6144
V100 32GB	HBM2	32	1.75	4096	0.9	128	10240	6144
A100 40GB	HBM2	40	2.4	5120	1.52	192	20736	40960
A100 80GB	HBM2e	80	3.2	5120	1.52	192	20736	40960
H100	HBM3	80	5.2	5120	3.35	192	25344	51200
H200	HBM3e	141	6.3	6144	4.8	192	25344	51200
B100	HBM3e	192	8	8192	8	N/A	N/A	N/A
B200	HBM3e	192	8	8192	8	N/A	N/A	N/A
R100	HBM4	N/a	N/a	N/a	N/a	N/a	N/a	N/a

Compute Performance, Interconnect & Networking

Model	FP32 (TFLOPS)	FP64 (TFLOPS)	INT8 dense tensor	FP16 dense tensor	bfloat16 dense tensor	TF32 dense tensor	FP64 dense tensor	Interconnect (NVLink; TB/s)	Networking
P100	10.6	5.3	N/a	21.2	N/a	N/a	N/a	0.16	ConnectX-4 (100 Gb/s)
V100 16GB	15.7	7.8	N/A	125 TFLOPS	N/A	N/A	N/A	0.3	ConnectX-5 (100 Gb/s)
V100 32GB	15.7	7.8	N/A	125 TFLOPS	N/A	N/A	N/A	0.3	ConnectX-5 (100 Gb/s)
A100 40GB	19.5	9.7	624 TOPS	312 TFLOPS	312 TFLOPS	156 TFLOPS	19.5 TFLOPS	0.6	ConnectX-6 (200 Gb/s)
A100 80GB	19.5	9.7	624 TOPS	312 TFLOPS	312 TFLOPS	156 TFLOPS	19.5 TFLOPS	0.6	ConnectX-6 (200 Gb/s)
H100	67	34	1.98 POPS	990 TFLOPS	990 TFLOPS	495 TFLOPS	67 TFLOPS	0.9	ConnectX-7 (400 Gb/s)
H200	67	34	1.98 POPS	990 TFLOPS	990 TFLOPS	495 TFLOPS	67 TFLOPS	0.9	ConnectX-7 (400 Gb/s)
B100	N/a	N/a	3.5 POPS	1.98 PFLOPS	1.98 PFLOPS	989 TFLOPS	30 TFLOPS	1.8	ConnectX-7 (400 Gb/s)
B200	N/a	N/a	4.5 POPS	2.25 PFLOPS	2.25 PFLOPS	1.2 PFLOPS	40 TFLOPS	1.8	ConnectX-7 (400 Gb/s)
R100	N/a	N/a	N/a	N/a	N/a	N/a	N/a	N/a	ConnectX-9 (1600 Gb/s)

References

^ Brown, W. Michael; Nguyen, Trung D.; Fuentes-Cabrera, Miguel; et al. (2012). "An Evaluation of Molecular Dynamics Performance on the Hybrid Cray XK6 Supercomputer". Procedia Computer Science. 9: 186–195. doi:10.1016/j.procs.2012.04.020.
^ ^a ^b ^c ^d Kharya, Paresh (February 2, 2018). "Achieving Maximum Compute Throughput: PCIe vs. SXM2" (Press release). Nvidia. Retrieved March 31, 2022 – via TheNextPlatform.com.
^ "Volta architecture whitepaper" (PDF). Nvidia.
^ "DGX 1 User Guide" (PDF). Nvidia.
^ Kennedy, Patrick (May 14, 2020). "Nvidia A100 4x GPU HGX Redstone Platform". ServeTheHome.com. Axautik Group. Retrieved December 30, 2025.
^ "Nvidia NVLink and NVSwitch". Nvidia. Retrieved December 30, 2025.
^ "Nvidia's H100 – What It Is, What It Does, and Why It Matters". DataCenterKnowledge.com. March 23, 2022. Retrieved March 31, 2022.
^ "Is PCIe 5.0 Worth It? The Benefits of PCIe 5.0 (2022)". TechReviewer.com. Retrieved March 31, 2022.
^ "Nvidia HGX A100: Powered by A100 GPUs and NVSwitch". Nvidia. Retrieved March 31, 2022.
^ "Nvidia H100 GPU full details: TSMC N4, HBM3, PCIe 5.0, 700W TDP, more". TweakTown.com. March 23, 2022. Retrieved March 31, 2022.
^ "NVIDIA Tesla P100". Nvidia.
^ "NVIDIA Tesla P100 SXM2". TechPowerUp.
^ "NVIDIA Tesla P100 PCIe 16 GB". TechPowerUp.
^ Garreffa, Anthony (September 17, 2017). "NVIDIA Tesla V100 Tested: Near Unbelievable GPU Power". TweakTown.com. Retrieved December 30, 2025.
^ Smith, Ryan (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech. Archived from the original on July 29, 2024.
^ Smith, Ryan (March 22, 2022). "NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder". AnandTech. Archived from the original on September 23, 2023.
^ "B100 vs B200: Which NVIDIA blackwell GPU is right for your AI workloads? | Blog — Northflank". Northflank — Deploy any project in seconds, in our cloud or yours. Retrieved June 15, 2026.
^ "Comparing Blackwell vs Hopper | B200 & B100 vs H200 & H100 | Exxact Blog". www.exxactcorp.com. Retrieved June 15, 2026.
^ Mitrasish; Co-founder; CTO; Spheron. "NVIDIA Rubin R100 GPU Chip Specs: Architecture, VRAM, and Cloud Availability (2026) | Spheron Blog". Spheron. Retrieved June 13, 2026.

External links

Erlangen National High Performance Computing Center page on high performance computing with 4x and 8x A100 per computer node, also showing switch topology dumps

[1] Brown, W. Michael; Nguyen, Trung D.; Fuentes-Cabrera, Miguel; et al. (2012). "An Evaluation of Molecular Dynamics Performance on the Hybrid Cray XK6 Supercomputer". Procedia Computer Science. 9: 186–195. doi:10.1016/j.procs.2012.04.020.

[Kharya_2018-2] Kharya, Paresh (February 2, 2018). "Achieving Maximum Compute Throughput: PCIe vs. SXM2" (Press release). Nvidia. Retrieved March 31, 2022 – via TheNextPlatform.com.

[3] "Volta architecture whitepaper" (PDF). Nvidia.

[4] "DGX 1 User Guide" (PDF). Nvidia.

[5] Kennedy, Patrick (May 14, 2020). "Nvidia A100 4x GPU HGX Redstone Platform". ServeTheHome.com. Axautik Group. Retrieved December 30, 2025.

[6] "Nvidia NVLink and NVSwitch". Nvidia. Retrieved December 30, 2025.

[7] "Nvidia's H100 – What It Is, What It Does, and Why It Matters". DataCenterKnowledge.com. March 23, 2022. Retrieved March 31, 2022.

[8] "Is PCIe 5.0 Worth It? The Benefits of PCIe 5.0 (2022)". TechReviewer.com. Retrieved March 31, 2022.

[9] "Nvidia HGX A100: Powered by A100 GPUs and NVSwitch". Nvidia. Retrieved March 31, 2022.

[10] "Nvidia H100 GPU full details: TSMC N4, HBM3, PCIe 5.0, 700W TDP, more". TweakTown.com. March 23, 2022. Retrieved March 31, 2022.

[11] "NVIDIA Tesla P100". Nvidia.

[12] "NVIDIA Tesla P100 SXM2". TechPowerUp.

[13] "NVIDIA Tesla P100 PCIe 16 GB". TechPowerUp.

[14] Garreffa, Anthony (September 17, 2017). "NVIDIA Tesla V100 Tested: Near Unbelievable GPU Power". TweakTown.com. Retrieved December 30, 2025.

[:3-15] Smith, Ryan (May 14, 2020). "NVIDIA Ampere Unleashed: NVIDIA Announces New GPU Architecture, A100 GPU, and Accelerator". AnandTech. Archived from the original on July 29, 2024.

[:2-16] Smith, Ryan (March 22, 2022). "NVIDIA Hopper GPU Architecture and H100 Accelerator Announced: Working Smarter and Harder". AnandTech. Archived from the original on September 23, 2023.

[17] "B100 vs B200: Which NVIDIA blackwell GPU is right for your AI workloads? | Blog — Northflank". Northflank — Deploy any project in seconds, in our cloud or yours. Retrieved June 15, 2026.

[18] "Comparing Blackwell vs Hopper | B200 & B100 vs H200 & H100 | Exxact Blog". www.exxactcorp.com. Retrieved June 15, 2026.

[:4-19] Mitrasish; Co-founder; CTO; Spheron. "NVIDIA Rubin R100 GPU Chip Specs: Architecture, VRAM, and Cloud Availability (2026) | Spheron Blog". Spheron. Retrieved June 13, 2026.

[1]