Seems absurd on the face of it, doesn't it? But let's play this out.
Nvidia's RTX Spark is a unified ARM CPU, GPU, and RAM as a system on chip (SoC). This is how Apple has done its silicon for a while now. But why is Nvidia doing this? Why now? One read could be that they want to deliver a better personal computing platform to diversify from their concentrated data center revenue stream. This may very well be the case.
That said, Nvidia is making most of their money from AI data centers. Those customers are buying up all the GPUs that Nvidia can manufacture. So why spare precious fab capacity on a laptop chip? Maybe the answer isn't about a better personal computing platform but rather in where Nvidia's customers are constrained. Satya Nadella said in an interview that he has GPUs ready to go but no power to turn them on in many instances. It is increasingly clear that Nvidia's US customers are electricity constrained. Nvidia's GPUs are famously power-hungry, capping out at a whopping kilowatt of power on a single die.
This is not only a power draw side challenge. It's also a challenge on the cooling side. Increasing number of US cities and states are pushing data centers on water usage. Water usage mostly comes from open-circuit and evaporative cooling needs. You could flip around and use dry coolers (which are just industrial A/Cs or heat pumps), but then you'd draw much more energy (Carnot physically puts a pretty big floor on energy demand for cooling). So that's the catch-22: more flops means using more energy, using more energy means more cooling, stressing water or taxing the already stretched grid.
This makes the next optimization axis obvious. It must be flops per watt of power consumed. Guess who is the undisputed king for performance per watt? Apple! The M3 Ultra, for example, pushes nearly the same amount of flops as a desktop 5070 while consuming a sixth of the power on a full machine basis (the Mac is an SoC while our poor desktop needs a CPU, PSU, fans, etc., so it's not totally fair).
Another example of the 'raw power' philosophy of Nvidia contrasted with Apple is how they do memory. B200 gets close to 8 terabytes per second. That's an insane bandwidth unheard of before! How does Nvidia achieve this? First you get exotic HBM3e dies, then you stack them 8 at a time on an 8K bus. Nvidia squeezes out everything TSMC can do on the die AND on the packaging to deliver this monstrous performance. Apple? They take consumer-grade low-power mobile DDR5 RAM, nothing exotic that requires exotic fabs, put them on a wide bus, and squeeze out 800GB/s on consumer-grade hardware. The price difference, and more crucially the performance-per-watt difference, is enormous.
Apple historically does not like the data center business. The customers are too concentrated for Apple to extract its premium margins. They are demanding and less keen on Apple's design aesthetic. But now may be a unique time in history when this is no longer fully true. To realise why, think about the inverse framing of Moore's law. The marginal utility of computing has been halving roughly every 18 months. That's an astounding thought! We had all this compute power increase and we had very little idea where to put it, so it just went to price decreases! Now, maybe for the first time in the arc of history since Moore's law has been in effect, we have a situation where we see that trend being bucked. Why? Because of the insane demand for AI, of course. This means, for the first time, we are seeing enormously more flops produced AND the price per unit of computation (especially for frontier use cases for AI, inclusive of power) going UP simultaneously! This means the margin is there for Apple to take if they so wish. As for the design aesthetic, Apple's design aesthetic matters here precisely because they have had the "taste" to know the power of flops per watt (sorry for the pun) MUCH before the rest of the market did.
So, if Apple wanted to, they could get TSMC budget and throw much of it on M5 Ultra SoCs with an absolutely gobsmacking amount of RAM on them, and sell to data center customers at a markup. They're already sort of doing it for their own inference needs with the Baltra chips. These would probably sell very well for inference-type workloads that demand a lot of unified memory with decent bandwidth and extremely efficient flops per watt and correspondingly lower cooling requirements. Should Apple ever consider that, RTX Spark allows Nvidia to sidestep it entirely. They now have an equivalent thing running CUDA cores and their margin needs are only a tad bit worse than Apple's 😉.