How to build an AI Datacentre

November 2, 2024
2:41 pm

Play this quick word association game. What instantly comes into your head when we say Generative AI?

‘Cheating’ at report writing with ChatGPT
Wishing you’d bought some NVidia stock 5 years ago
Building a data centre hosting 10,000, or even a million, NVidia GPUs
Something else?

Probably, all of us can relate to 1 and 2. Maybe you’ve not given much thought to 3?

ChatGPT 3.5 was trained on about 10,000 NVidia graphical processors (GPUs). GPT 4.0 uses double that. And future AI models may require not thousands, but millions of GPUs. How would you build that.

Those are boggling numbers, but you can chunk the problem down. The building block of any AI data centre is a relatively compact 256 x GPU scalable unit – a pod.

Each pod has 8 compute racks and 2 networking racks. Infiniband – high throughput, low latency – networks the GPUs within the pod (and pods to each other) using NVidia’s Quantum-2 switches.

The arithmetic within a pod is relatively easy:

Every GPU in the pod requires 800G of bandwidth
InfiniBand NDR ( Next Data Rate ) does this by pairing 2 x 400G ports in a ‘4 lane’ arrangement
Each lane uses two fibres – one each for Tx and Rx
So each 800G connection requires 8 fibres – a Base-8 solution.
Each Quantum-2 switch supports 64 x 400G ports, with a total throughput of around 50 Tb/s each

Your first task is to connect each GPU to its switch. You’re going to need 256 patch cords and passive MPO-MPO is where the industry is headed. The challenge comes when you scale. A fabric of 16k GPUs, say, which networks 256 GPU pods.

Firstly, we’ll need a ‘rail-optimised network architecture’. That’s essentially three layers of switching, highly meshed and distributed, for minimum latency and maximum throughput. Leaf switches – within each pod – interconnecting GPUs there. Spine switches interconnect leaf switches. And core switches interconnect spine switches.

There’s one number to remember: – 16,384. For example:

GPUs we’re interconnecting = 16384
Total connections between GPUs and their leaf switches = 16384

So far we’ve not left the level of the pod and its leaf switches. However:

Connections between leaf and spine switches = 16384
Connections between spine and core switches = 16384

Now we’re dealing with cable runs between 50 to 500 metres. Pulling 16,384 MPO cords gives us a huge challenge with space, labour, sustainability and ongoing moves and changes. Now imagine doing it again, for the core switch layer.

So the other revolution – not as sexy as AI perhaps but no less important – is in structured cabling systems. Corning Edge 8^Ⓡ was designed specifically to simplify these ultra high speed, ultra dense cabling challenges.

With Edge 8 we could use a two-patch panel design and deploy 144-fibre trunk cables. One trunk cable replaces 18 MPO patch cords and can be pulled all at once through the data hall. To interconnect one pod to its spine switches for a non-blocking fabric would take just 15 MPO trunks – versus 256 patch cords – and we’d have 14 Base-8 connections as spare.

Shrinking the volume of cabling, expanding the flexibility of the cabling system, and to continually delivering innovation in simplicity, scalability and sustainability. That’s Corning’s contribution to our shared AI future.