The Computing Power Race of NIO, XPeng, and Li Auto

This year, at new car launch events, every CEO must dedicate a slide to computing power.

Over the past month and a half, NIO, XPeng, and Li Auto entered the mainstream four-digit computing power club on three separate evenings: the NIO ES6/EC6 popularized the self-developed Shenji chip; the Li Auto L Series Smart Refresh Edition became the first to deliver NVIDIA’s Thor chip; and the XPeng G7 debuted with the in-house Turing chip.

In 2025, all three will upgrade their next-gen intelligent hardware and software, embarking on another round of competition in smart assisted driving and digital cockpits—each taking distinct paths toward the same goal.

“Distinct paths” refers to their differing approaches to distributing computing power as they stride into the era of four-digit sparse inference TOPS per vehicle. “Same goal” means this rapidly advancing computing power will follow similar technological trajectories to enable more powerful assisted driving and smarter cockpits.

But questions arise:

What tangible upgrades do users gain from higher computing power?

Why are automakers racing to lead in this field?

Does more computing power truly make cars “smarter”?

While definitive answers remain elusive, we can explore the trends, concepts, and directions.

The Numbers Game

Twenty years ago, gasoline cars competed on horsepower; today, electric vehicles battle over computing power.

This isn’t to dismiss horsepower but to highlight how computing power now defines an automaker’s future.

Over the past 18 months, NIO, XPeng, and Li Auto pursued breakthroughs in computing power and capability—each prioritizing different technical paths, as reflected in their chip choices.

On June 11, at the XPeng G7 launch, CEO He XPeng showcased the Turing chip’s prowess with this slide: a total sparse computing power of over 2,200 TOPS, delivered by three Turing chips (≈750 TOPS each).

A CEO presenting at the launch of XPeng G7, with a slide displaying a computing power comparison diagram highlighting the efficiency of the Turing chip versus NVIDIA's Orin-X chip, showing figures of 80 TOPS, 700 TOPS, and 2200+ TOPS. — Turing chip’s power reaches 2,200 TOPS

A year and a half earlier, at NIO Day 2023, NIO unveiled the Shenji NX9031 chip, claiming “one chip rivals four.” This implies a single Shenji chip delivers ≈1,000 TOPS of sparse computing power.

The ET9, the first model equipped with Shenji, carries two 9031 chips, enabling millisecond-level hot redundancy and a combined sparse computing power of ≈2,000 TOPS.

Presentation slide featuring NIO's NX9031 chip, labeled as 'NIO's First 5nm AD Chip' alongside 'SkyOS' vehicle operating system, with a speaker discussing the content. — NIO launched the Shenji NX9031 at the 2023 NIO Day

Side note on NIO: Models like the new “5566” series, running the Cedar-S (“Standard”) system, feature a single 8295 + single Shenji on a shared domain controller (BOX). Thus, Cedar-S vehicles offer ≈1,000 TOPS for assisted driving—though real-world performance varies (discussed later).

In May, Li Auto began deliveries of its L Series Smart Refresh Edition and new MEGA, becoming the first automaker to mass-produce vehicles with Thor-U chips.

Calculated by Tensor Core (excluding CUDA cores), a single Thor-U delivers ≈700 TOPS; dual chips reach 1,400 TOPS.

Another note: Orin-X’s effective sparse computing power is ≈200 TOPS when counting only Tensor Cores.

Comparing rivals: Tesla HW4 offers ≈720 TOPS dense computing power (sparse ≈2×); Horizon’s J6P chip delivers 560 TOPS sparse.

Superficially, we might rank them: Triple Turing > Dual Shenji > Dual Thor-U > Single Shenji—but is this accurate?

Statements from other players reveal a more nuanced picture.

The Memory Bandwidth Bottleneck

At Tesla’s 2024 earnings call (Jan. 30), AI VP Ashok Elluswamy identified “onboard memory bandwidth” as the core constraint limiting the “context” capacity of FSD’s AI model.

“Context” defines the scope of information a large language model (LLM) processes in one go. For visual LLMs in assisted driving, it corresponds to real-world environmental data.

Memory bandwidth—the invisible parameter chaining every assisted-driving model—lurks behind the computing power numbers.

NVIDIA’s Orin-X, which dominated the 2021–2024 premium ADAS market, offers 205 GB/s memory bandwidth per chip.

A speaker presenting at a conference in front of a large screen displaying a detailed diagram of a high-performance chip with specifications listed. — NVIDIA Orin-X chip

Estimating under int8 precision with single-pass attention: Orin-X would take ≈50ms to process data for a 14B-parameter model—far exceeding the 20–30ms latency requirement for assisted driving.

Hence, Orin-era automakers capped onboard models at ≤4B parameters (e.g., Li Auto’s VLM model: 2.2B) to prioritize low latency.

In fact, low latency is a safety imperative; memory bandwidth is often the bottleneck for assisted-driving AI capabilities.

Thor-U’s bandwidth? 273 GB/s. The unreleased Thor-X flagship reaches 546 GB/s. This 33% boost over Orin-X enables larger onboard models.

But two issues arise: Thor-U’s computing power leap far outpaces its bandwidth gain, and its bandwidth is unremarkable versus rivals’.

On Thor’s asymmetric progress, Zhuoyu Tech’s AI chief Chen Xiaozhi explained: “When running assisted-driving models, Thor-U faces memory bottlenecks as severe as Orin-X.”

In short, world-class chip power is strangled by insufficient memory bandwidth.

What about NIO and XPeng?

XPeng hasn’t disclosed Turing’s bandwidth, but with 64GB memory per chip and rumored 256-bit bus width (using LPDDR5X-8533), total bandwidth likely matches Thor-U at 273 GB/s.

NIO, however, doubled it: Shenji boasts 546 GB/s via a 512-bit LPDDR5X interface—on par with NVIDIA’s flagship Thor-X.

A promotional slide featuring a high-tech computing chip, displaying specifications including over 20 AI data points, less than 5 milliseconds data processing time, 546 GB/s bandwidth, and over 1,000 chip-level safety design capabilities. — NIO’s Shenji chip

546 GB/s theoretically supports a 32B non-MoE model within 30ms latency (excluding inference optimizations).

Note: Bandwidth discussed is per-chip, as ADAS chips communicate via PCIe/Ethernet (slower than memory bandwidth).

Conclusion: NIO’s Shenji nears U.S. export restriction limits; XPeng’s Turing aligns with mainstream trends (≈ Thor-U). Both sub-300GB/s solutions face memory constraints—a universal challenge requiring software optimization.

Another aside: Next-gen L5 chips will need HBM memory, but 2023–2025 U.S. restrictions block its export. Hence, NIO’s approach skirts the limit.

Finally, two guarded specs for “cyber cricket fights”: Huawei ADS and Tesla HW4 memory bandwidth?

Per blogger @WancheDan’s teardown, Huawei’s MDC610 (200 TOPS dense, 2020 mass production) has a 384-bit bus. For context, Tesla’s 2019 FSD chip: only 128-bit.

Rumors suggest MDC610 achieved 308 GB/s (using low-frequency LPDDR5)—still higher than Thor-U.

As for Tesla HW4, Munro Live/@Greentheonly confirmed 8 Micron GDDR6 modules, yielding ≈448 GB/s bandwidth.

Thus, until Shenji’s 2023 launch, Tesla held the bandwidth crown. Shenji now temporarily claims it.

How to Utilize Massive Computing Power?

Two angles remain:

How do they allocate computing power?

Key facts:

Theoretical total TOPS ≠ actual ADAS TOPS. Allocation (hardware/software) reflects each company’s technical strategy.

Take NIO’s NT2 platform: four Orin-X chips (1,016 TOPS total). But slow interconnects prevent shared tasks/memory. NIO splits them as “2+1+1”: dual Orin for ADAS, plus one each for cabin-driving integration, swarm intelligence, safety, and redundancy.

In the Shenji era, NIO diverges:

“5566” models (NT2 platform) use single Shenji (≈1,000 TOPS, Cedar-S system).

Flagship ET9 (NT3 platform) runs dual Shenji (2,000 TOPS) + exclusive hardware (e.g., 3 LiDARs, active suspension).

Whether the upcoming NT3 ES8 retains dual Shenji—and how it splits 2,000 TOPS—remains key.

Allocation shapes user experience and determines real-world efficacy beneath the numbers.

NVIDIA’s Thor supports real-time partitioning (e.g., dedicating Tensor Cores to cockpit AI). Post-Orin “cabin-driving integration” means next-gen assistants (e.g., Li Auto’s “Ideal Classmate”) won’t be cockpit-chip limited.

But XPeng’s move is bolder: one of three Turing chips (≈750 TOPS) is reserved solely for AI cockpit—leaving ≈1,500 TOPS (≈ dual Thor-U) for ADAS.

A speaker presenting at the XPeng G7 launch event, showcasing the Turing AI chip and its specifications on a large screen behind. — He Xiaopeng announced there is one Turing for AI cabin while other two for ADAS at the XPeng G7 launch

This hardware commitment signals XPeng’s intent to reclaim its “smart cabin” leadership.

How are chips designed?

Allocation isn’t just about power—it’s about silicon real estate.

Chip design reflects strategic bets on future tech. Every mm² (costing millions) shapes a car’s intelligence for years.

Take the ISP (Image Signal Processor), an unsung hero in ADAS chips. While Tensor Cores grab headlines for TOPS, ISPs preprocess raw camera data (noise reduction, HDR, IR handling)—critical for perception.

Automakers now diverge sharply on ISP importance:

Elon Musk’s “anti-ISP” stance: Since 2022, he argued ISPs add latency and lose photons. By FSD Beta 10.8 (Dec 2021), Tesla cut photon-to-control latency by 20%. FSD v12 now processes raw 36Hz feeds from 8 HW4 cameras. Though unconfirmed via die shots, Tesla likely removed ISP functions. Our China road tests (Feb 2024+) confirm FSD’s ultra-low latency leads the industry.

Rivals’ ISP arms race: XPeng’s Turing packs two dedicated ISPs per chip—one for AI perception, one for image synthesis—to optimize low-light/rain/backlit scenes. Despite sharing TSMC nodes with Tesla HW4, this design split will yield divergent ADAS experiences, even within the “vision-only” camp.

NVIDIA & NIO: “big vs. bigger”: Thor-U upgraded ISP from 1.84 Gpps (Orin-X) to 3.5 Gpps (Giga pixels per second)—modest versus its TOPS leap. NIO’s Shenji, however, crammed in a “mad” 6.5 Gpps ISP (1.7× Thor-U, 2.7× Mobileye EyeQ Ultra’s 2.4 Gpps). Early benefits include brighter blind-spot/streaming mirror feeds. The real test: Can such ISPs match Tesla’s raw-data approach in real-world NWM (Neural World Model) performance?

Comparison of night-time images captured using an 8MP camera showcasing the performance of an ISP, highlighting differences in image quality. — ISP test between NIO’s Shenji NX9031 and its rivals

Much remains unexplored, but ultimately, automakers’ and AI giants’ macro-decisions—crystallized in 200–300 mm² chips—will reshape five-meter-long vehicles and move trillion-dollar industries.

This is “mustard seeds containing vast universes”—a convergence of technology and philosophy where minute shifts trigger global ripples.

As this article finalized, Tesla’s next-gen AI5 hardware specs leaked: dual-sourced (TSMC N3P + Samsung 3GAA), targeting 2,000–2,500 TOPS.

Thus, NIO-XPeng-Li Auto’s charge is merely the opening act. The true climax of this computing power war—reshaping assisted and autonomous driving for the next decade—will be a battle royale of giants.

Discover more from ChinaEVHome

Subscribe to get the latest posts sent to your email.

The Computing Power Race of NIO, XPeng, and Li Auto

The Numbers Game

The Memory Bandwidth Bottleneck

How to Utilize Massive Computing Power?

How are chips designed?

Like this:

Related

Discover more from ChinaEVHome

The Numbers Game

The Memory Bandwidth Bottleneck

How to Utilize Massive Computing Power?

How are chips designed?

Share this:

Like this:

Related

Discover more from ChinaEVHome

Related News