Chip Letter Links No. 18: NexGen, Nvidia, Die Topology, FastGPT in Fortran and more

Great links, images and reading for 4 June 2023

Jun 04, 2023

∙ Paid

Hi everyone and thanks for subscribing. This is one of our regular series of posts with links, images and articles of interest, inspired by Adam Tooze’s excellent Chartbook.

Each edition starts with a beautiful die image. This week it’s a NexGen RISC86 CPU courtesy of Fritzchens Fritz.

If you’ve enjoyed this edition of The Chip Letter then please share with your friends or on social media.

Nvidia Joins the Trillion Dollar Club

It’s been quite a week for Nvidia. They (briefly) joined the small group of tech companies with a market capitalisation of more than one trillion dollars.

To put this into context Nvidia’s market cap is now greater than Intel, AMD and TSMC combined, and greater than any semiconductor company in history.

Nvidia featured in the first half of this week’s Ones and Tooze podcast, featuring Adam Tooze. It’s an interesting discussion of semiconductors in a geopolitical context.

Jensen Huang Commencement Speech

If it’s been a momentous week for Nvidia then it’s probably been a good week for Nvidia’s CEO Jensen Huang. He’s been busy unveiling the Grace Hopper ‘superchip’ but found time to give a commencement speech last week at National Taiwan University.

Transcript courtesy of the excellent Interconnected Substack.

Interconnected

Jensen Huang NTU Commencement Speech 2023

Jensen Huang, two days after Nvidia's historic quarterly earnings, delivered the commencement speech at National Taiwan University – one of the elite universities in Taiwan. He shared three near-death Nvidia stories – making the wrong architecture choice when building for its first customer Sega, betting the company on CUDA, and retreating from Android …

2 years ago · 11 likes · 3 comments · Kevin Xu

Delayed Branch Substack on Intel CPU Die Topology

It’s always great to find interesting new Substacks, even if they don’t post very often. I’m delighted to add

Delayed Branch

Jason Rahman

to my recommendations.

This latest post in Intel CPU die topology (how components connect together on a die) was a really interesting read.

Delayed Branch

Intel CPU Die Topology

Over the past 10-15 years, per-core throughput slowed down, and in response CPU designers have scaled up core counts and socket counts to continue increasing performance across generations of new CPU models. This scaling however is not free. When scaling a system to multiple sockets using NUMA (Non-Uniform Memory Access), software must generally take th…

2 years ago · Jason Rahman

And here is the follow-up on AMD CPU (Rome + Milan) dies.

Delayed Branch

AMD CPU Topology - Rome + Milan

Last time we covered Intel CPU inter-core interconnect topologies from the Cooperlake and Ice Lake server products. Next up, we’re going to cover the last two generations of AMD server CPUs, Rome + Milan. Core Count Scaling As the size of a physical CPU die increases, yield of working dies from a wafer decreases, and cost per processor rapidly increases…

2 years ago · Jason Rahman

Intel vs AMD

On the topic of Intel and AMD, regular Chipletter readers will know that all Jon’s posts at

The Asianometry Newsletter

are self recommending, but I think this week’s on the rivalry between Intel and AMD is particularly good.

The Asianometry Newsletter

Intel & AMD: The First 30 Years

If you want to watch the video, it is below…

2 years ago · 15 likes · Jon Y

As a bonus it contains a link to an interesting discussion between Jon and the great

Dylan Patel

SemiAnalysis

Why is Rosetta 2 Fast?

The very first Rosetta - By © Hans Hillewaert, CC BY-SA 4.0, https://commons.wikimedia.org/w/index.php?curid=3153928

Apple’s transition from Intel to ‘Apple Silicon’ has been amazing. When they were introduced, not only were Apple’s Arm based SoC’s remarkable in their performance and power consumption, but Apple seems to have handled the change seamlessly.

A lot of this is down to ‘Rosetta 2’ which enables Apple Silicon Macs to run Intel code. So I really enjoyed this post that explains why Rosetta 2 runs so quickly. From the conclusions:

Engineering is about making the right tradeoffs, and I’d say Rosetta 2 has done exactly that. While other emulators might require inter-instruction optimisations for performance, Rosetta 2 is able to trust a fast CPU, generate code that respects its caches and predictors, and solve the messiest problems in hardware.

The fact that this is possible seems like a serious weakness in Intel / AMD’s x86 moat.

Links: Blog Post

Apple’s Neural Engine

A modern Apple SoC contains not just CPUs and a GPU but a Neural engine for machine learning calculations.

Given the increasing focus on ‘on-device’ machine learning it’s perhaps surprising that we don’t know more about these engines. But thanks to hackers like George Hotz and Matthijs Hollemans we can glimpse at this hardware and what it’s capable of:

The Apple Neural Engine is a fancy DMA Engine that is based around convolutions. We don't have all the details worked out yet, but we can do some things with it.

Expect to see a lot more discussion of accelerators on Apple’s and others hardware - possibly even at Apple’s WWDC this week.

Links : George Hotz’s GitHub : Matthijs Hollemans’s Github

After the paywall : GPT-2 in 300 lines of Fortran, why corporate America still runs on fragile ancient software, a fun online assembly language simulator and semiconductor numbers everyone should know.

The Chip Letter

Chip Letter Links No. 18: NexGen, Nvidia, Die Topology, FastGPT in Fortran and more

Great links, images and reading for 4 June 2023

Nvidia Joins the Trillion Dollar Club

Jensen Huang Commencement Speech

Delayed Branch Substack on Intel CPU Die Topology

Intel vs AMD

Why is Rosetta 2 Fast?

Apple’s Neural Engine

This post is for paid subscribers