Are Chiplets Enough to Save Moore’s Law?

June 2, 2023

[ad_1]

//php echo do_shortcode(‘[responsivevoice_button voice=”US English Male” buttontext=”Listen to Post”]’) ?>

During a press conference at Computex this week in Taiwan, Nvidia CEO Jensen Huang and MediaTek CEO Rick Tsai announced that Nvidia would be supplying GPU chiplets to MediaTek to be incorporated into a yet-to-be-designed system-on-chip (SoC) for in-cabin automotive applications along with Nvidia AI and graphics IP.

Chiplets are not new to Nvidia. This announcement also adds a bit more validation for chiplets as a concept—one that many semiconductor makers are counting on to help keep Moore’s Law alive for the next several years.

In Taiwan this week, MediaTek CEO Rick Tsai, left, and Nvidia CEO Jensen Huang, right, announced that MediaTek would incorporate Nvidia GPU chiplets into an SoC designed for in-cabin automotive applications. (Source: Nvidia)

The idea behind chiplets is hardly a new concept. The industry has been making multi-chip modules for decades: Mostek, for example, put two MK4116 16-Kbit DRAM chips in a dual-cavity ceramic package to create the MK4332D 32Kbit DRAM back in 1979. Intel also mated a CPU chip and an SRAM chip in the Pentium Pro, introduced in late 1995. These multichip modules (MCMs) allowed Mostek and Intel to transcend the limitations of their semiconductor processes to create packaged devices that were “more than Moore.”

So, co-packaged semiconductors in the form of MCMs have been around for quite a while, and chiplet technology is, in many ways, just an extension of the MCM concept—albeit with a lot more technology.

Perhaps the earliest use of contemporary chiplet technology is the Xilinx Virtex-7 2000T FPGA, introduced in late 2011. That FPGA, and the Xilinx Virtex-7 580HT introduced shortly after, employed a chiplet-on-silicon–interposer technology co-developed by Xilinx and Taiwan Semiconductor Manufacturing Co. (TSMC). That silicon interposer technology has evolved and is still available from TSMC, and it is now called CoWoS (Chip on Wafer on Substrate).

Chiplets’ two biggest advantages

The Xilinx Virtex-7 2000T and 580HT demonstrated two of the biggest advantages that chiplets provide.

For the Virtex-7 2000T, the assembly of four 28-nm FPGA chiplets into one package using a silicon interposer allowed Xilinx to build a much larger FPGA that would be possible with a monolithic 28-nm die. The interposer allows a semiconductor maker to exceed the reticle limit of a wafer stepper by assembling large die into a mosaic that is larger than is possible with one die.

The Virtex-7 580HT deleted one of the Virtex-7 2000T’s four FPGA chiplets and replaced it with a 28Gbps transceiver chiplet, at a time when it was not possible to build 28Gbps transceivers using the mainstream 28-nm digital CMOS process used to manufacture the FPGA chiplets.

Consequently, the second advantage that chiplets deliver is the ability to mix and match die that have been fabricated using different process nodes, quite possibly from different foundries. Important process nodes that are notably different from mainstream and leading-edge digital process nodes include analog processes, memory processes (such as DRAM processes, especially in the form of high-bandwidth memory (HBM) memory stacks), and high-current or high voltage processes—especially exotic processes, such as Gallium-Arsenide (GaAs) for photonics and Silicon-Carbide (SiC) for power semiconductors.

Limited use so far

However, the ecosystem for commercial chiplets—one where a marketplace of chiplets from many vendors can be incorporated into a multichip SoC by multiple packaging vendors with mix-and-match ease—has yet to appear.

The use of chiplets has been largely restricted to individual chipmakers like AMD, which completed its acquisition of Xilinx in 2022 and adopted its chiplet technology; and Intel, which first employed its own proprietary EMIB (embedded multi-die interconnect bridge) and AIB (Advanced Interface Bus) chiplet-packaging technologies for Stratix 10 FPGAs, launched in 2016.

In both AMD and Intel’s cases, chiplets have proved so successful that the use of chiplet technologies has now spread throughout the companies’ respective product lines, including their flagship processor products.

In the most extreme example, Intel created an IC with more than 100 billion transistors in the package by incorporating 47 active chiplets (Intel prefers to call them “tiles”) into the design of its Ponte Vecchio GPU (now called the Data Center GPU Max Series) for high-performance computing applications. That is not currently feasible with a monolithic chip.

The Intel Ponte Vecchio GPU, now called the Data Center GPU Max, incorporates 47 active chiplets in one package, for a total of more than 100 billion transistors. (Source: Intel)

Interface standards lacking

One of the things holding back the broad commercialization of chiplets is the lack of physical and electrical interface standards.

Intel made AIB available as an open-source standard, which has now been formalized by the CHIPS Alliance consortium, but there are other competing proposals. Two leading chiplet interface standards include the strangely named “bunch of wires” (BoW), an open die-to-die (D2D) interconnect specification advocated by the Open Compute Project (OCP) Foundation, and the Universal Chiplet Interconnect Express (UCIe), a different and open specification for D2D interconnect, co-developed by AMD, Arm, ASE Group, Google Cloud, Intel, Meta, Microsoft, Qualcomm, Samsung and TSMC.

When Intel CEO Pat Gelsinger discussed his company’s participation in the UCIe Consortium at last year’s Intel Innovation event, the consortium had 80 members. Just a few months later, that number has risen to more than 100 member companies.

Interface wiring specifications are one thing, but the high-speed SerDes PHY—the physical-layer signaling specification needed to push bits over those wires at multi-Gbps rates—is quite another. The obvious serial protocol candidates—Ethernet and PCIe—are both designed for operation over much longer signal paths than what is needed for D2D interconnects. Consequently, existing package-to-package, board-to-board, and box-to-box signaling schemes consume far too much power per bit transferred and are therefore considered unsuitable as D2D interconnect standards.

Several IP companies, including Innosilicon, Cadence and Synopsys, offer high-speed PHY IP for D2D communications. One new entrant in the UCIe PHY race, Eliyan, recently released the results of its first silicon realization of its NuLink D2D PHY IP.

Eliyan’s PHY technology focuses on three critical factors for D2D interconnect: per-lane bandwidth, power consumption per bit transferred, and bit-rate performance over distance for organic substrates.

Eliyan recently completed tests of its first silicon test chiplet with the current NuLink PHYs. The test chiplets are implemented with TSMC’s N5 CMOS process technology and integrate four channels of 16 lanes per channel. Each channel has 16-bit lanes with one clock signal pair per channel. Eliyan assembled 10 of these test chiplets onto an organic substrate as five transmit/receive pairs with different spacing between each pair, to test the reach of the NuLink PHY over the organic substrate.

The spacings between transmit/receive chiplet pairs are 19-21.5mm, 15-17.5mm, 10-12.5mm, 5-7.5mm, and 2-4.5mm. The variability in spacing between the pairs represents the varied locations among the chiplets’ signal line bumps for each lane.

Eliyan has tested its UCIe PHY with test chiplets manufactured with TSMC’s N5 CMOS process on this organic test substrate, which has five transmit/receive chiplet pairs at varying separation distances. (Source: Eliyan)

These test chiplets achieved 28.8Gbps per lane in unidirectional operation and 32Gbps per lane in bidirectional operation (16Gbps in each direction, simultaneously) over all the separation distances on the test substrate. Based on the resulting eye diagrams, Eliyan believes it can tweak the electrical operating parameters for its PHY transceivers to boost speeds to 32Gbps and 40Gbps, respectively, for unidirectional and bidirectional operation. For 16Gbps, unidirectional operation, the power consumption is 0.43 pJ/bit. For 32Gbps, bidirectional operation (16Gbps in each direction), the power consumption is 0.52 pJ/bit.

Until the UCIe Consortium develops the requisite standards—including a standard PHY—and until a critical mass of companies—including assembly, packaging and test firms—join the chiplet ecosystem, the chiplet market will remain small, and chiplet use will be limited to large semiconductor suppliers, such as AMD, Intel, MediaTek and Nvidia, which can afford to be pioneers.

However, the UCIe Consortium’s large and rapidly growing membership roster indicates a substantial amount of interest in chiplet technology. So it is likely that the momentum is already there and that chiplet technologies may be able to go mainstream in just a few years.

[ad_2]