A Costly Upgrade to Spain’s “Second Largest Supercomputer”

A couple of days ago, I came across the news about the upgrade of “the second-largest supercomputer in Spain”, Caléndula, in Castilla y…

Jul 26, 2024

A couple of days ago, I came across the news about the upgrade of “the second-largest supercomputer in Spain”, Caléndula, in Castilla y León. The project has cost €20M.

The announcement by the president of the region was quite eye-catching:

The supercomputer is able to read Don Quixote 12,000 times per second

El Centro de Supercomputación recibe en 2024 más de 14 millones de euros de inversión — And… what should I say? 12,000 Quixotes… that sounds smart

Whatever the metric above is — I am totally unable to back calculate it — let’s look at this with a critical eye.

Maximum Capacity
Caléndula’s maximum computational capacity is 7,000 TFLOPs, which, assuming they operate it at an efficiency rate (MFU) similar to that of META (40%) means at best it produces 2,800 TFLOPs.

A couple of very direct comparisons to put into perspective, and in all honesty, embarrass whoever is responsible for this press release.

Time to Train Llama 3.1
Llama 3.1 405B (the big release of the week) had a total computational effort of 3.8 x 10²⁵ FLOPs. With the max theoretical capacity, it would take Caléndula around 430 years to train an LLM as big as Llama 3.1. You better sit down.

What if we rented that capacity?
You could argue then that the goal of the supercomputer is not to create an LLM at the end of the day; not everything is LLMs, right? Let’s look at it from an economic perspective.

The capacity of Caléndula is equivalent to seven H100 GPUs at META rates(2,800 TFLOPs divided by 400 TFLOPs/H100 as per table 4 of the technical report).

At Luzia, we rent H100s for experimentation at around $4/hour. That means that we would pay circa $28/hour or $672/day to reach the same performance as Caléndula. The obvious follow-up is that $20M would give us GPU for around 82 years worth of compute…

But what if we don’t want to depend on Amazon? If we were to buy the capacity in H100s? Each H100 is selling for around $25k, which means that we could arm a cluster of 7 H100 with similar capacity for around €175k.

I could be wrong in a full order of magnitude, and the conclusion still holds. The announcement is embarrassing and makes no sense at all.

When you go deeper into the rabbit hole, you see that most of the budget has been spent on upgrading the network infrastructure of the university, increasing security, and creating a 1500 m2 building. Which is fine, still a lot of money, but fine. That said whatever the total real ammount of the super computer upgrade is (I can’t get to figure out) it’s clearly a waste o money. The researchers of the center would be for sure MUCH better off using GPUs as a service in many of the reliable online services.

Alvaro Higes

Discussion about this post

Ready for more?