Recently, artificial intelligence has received more attention due to its amazing capabilities in natural language processors such as ChatGPT. However, in addition, AI is slowly making an impact in other fields, such as using AI in chip design. Due to the increasing complexity and precision requirements of chip design, traditional design methods are no longer able to meet the demands. The rapid development of artificial intelligence technology has brought new possibilities for chip design. More and more manufacturers in the chip industry chain are exploring ways to use AI to assist in chip design. So, where should chip engineers go?
This article will introduce some breakthroughs made by chip giants in using AI technology in chip layout. The reason why AI can play an important role in chip layout is its excellent image recognition ability.
Chip layout is becoming increasingly time-consuming
In very large scale integration (VLSI), layout is one of the important steps in the chip design process. The layout of the chip determines the position of standard cells in the physical layout, and all different subsystems must be laid out in a specific way, while also ensuring that signals and data propagate between these areas at an ideal rate. Traditionally, this work is often done manually by engineers, and chip engineers typically spend weeks or months constantly improving and optimizing their designs, trying to find the optimal configuration for standard cells.
Analytical placement is currently the most advanced technology for the layout of ultra large scale integrated circuits, which can help designers achieve high-performance, low-power, and high reliability circuit design in the smallest chip area. Analyzing a layout typically involves three steps: Global Placement (GP), Legalization (LG), and Detailed Placement (DP).
Detailed Layout (DP) refers to the finer adjustment of component positions to further improve circuit performance and reduce power consumption. Detailed layout usually involves more complex algorithms and techniques, such as grid layout, global optimization, and local optimization.
Global Layout (GP) refers to the initial stage of placing circuit components on a chip, with the aim of making the positions of all components relatively reasonable without considering details, in order to facilitate subsequent layout steps. In the global layout, various algorithms and techniques are used to address issues such as area, power consumption, timing, and connectivity. Among these three steps, global layout is the most time-consuming part of analyzing the layout.
Rationalized layout (LG) refers to fine-tuning the component positions of the global layout to meet some rigid constraints, such as the minimum distance between circuit components, distance from the chip edge, and direction of adjacent components. The purpose of rational layout is to ensure that the positions of circuit components comply with design specifications and are as close as possible to the solution of the global layout without violating design limitations.
But now, with the increasing complexity and density of chips, an advanced chip integrates billions or even hundreds of billions of transistors, such as Apple's M1 Ultra which integrates 114 billion transistors, AMD's Instant MI300 accelerator chip which integrates 146 billion transistors, and the resulting modern complex relationship between power, performance, and area (PPA), making chip layout increasingly time-consuming and laborious.
In order to accelerate and optimize the IC design process, chip manufacturers in the industry are exploring the use of deep learning methods to design chips faster and more efficiently than humans. By using AI technology, chip designers can input design requirements into a computer, which can automatically recognize and process images and layout them according to specified rules and limitations. AI technology can generate chip layouts faster and more accurately, while avoiding errors for designers in repetitive and tedious tasks. Many companies, including some of the largest in the technology industry, are now investing in AI tools to accomplish some heavy workloads.
Google uses AI for chip layout
As early as September 2021, Google published an article titled "A Graphic Placement Method for Rapid Chip Design" in the journal Nature, claiming that using machine learning software can design better chips faster than humans. Google stated that it is using this artificial intelligence software to design its independently developed TPU chip.
Google wrote in the article, "Despite 50 years of research, chip layouts still cannot be automated, and physical design engineers need months of hard work to create manufacturable layouts. In less than 6 hours, our method automatically generates chip layouts that are superior or comparable to human drawn design drawings in all key metrics
Google regards chip layout planning as a reinforcement learning (RL) problem and has developed an edge based graph convolutional neural network architecture that can learn rich and transferable representations of chips. The specific evaluation workflow is shown in Figure 1. This process allows each method to access the same clustering network list hypergraph, using the same hyperparameters in all methods (as much as possible). After each method is completed (including the legal steps of RePlAce), the macro is captured into the grid grid, the macro position is frozen, and commercial EDA tools are used to place standard cells and report the final results.
Figure 1: The evaluation workflow of Google using reinforcement learning to generate results
(Source: Google)
Compare Google's approach with state-of-the-art methods (RePlAce14) and manual placement using industry standard EDA tools. The specific comparison indicators include time spent, total area, power consumption, line length, etc. For all indicators in this table, the lower the better. It can be seen that Google's reinforcement learning methods are superior to the other two.
(Source: Google)
Google's research team stated that as artificial intelligence comes into contact with a larger number and variety of chips, it can generate optimized layouts for new chip blocks faster and better through continuous training and learning. Although we mainly generate optimized layouts on Google Accelerator Chips (TPU), our method is applicable to any type of ASIC chip.
NVIDIA DREAMPlace commercialization is expected to advance
In terms of accelerating layout, existing parallelization work mainly uses partitioned multi-threaded CPUs. As the number of threads increases, the speed saturates at around 5 times, and the typical quality decreases by 2-6%. Nvidia engineers have explored using GPUs to accelerate the analysis of positional layout.
Traditional analysis layout engine development requires a lot of effort to build the entire software stack in C++. Therefore, due to the issue of development costs, the threshold for designing and validating new layout algorithms is very high. So, Nvidia utilized the deep learning toolkit PyTorch and with minimal software development work, developed a new GPU accelerated open-source layout engine - DREAMPlace, which is a well-known open-source layout engine. It analyzes the layout of key cores, such as wire length and density calculations, through efficient GPU implementation.
This framework is developed in Python, PyTorch is used for optimizing programs and APIs, and C++/CUDA is used for low-level operators. The DREAMPlace program runs on a Linux server based on Volta architecture with 40 cores Intel E5-2698 v4 @ 2.20GHz and an NVIDIA Tesla V100 GPU. It trains neural networks by throwing analysis layout problems.
The software architecture of DREAMPlace is shown in Figure A, and the overall layout flowchart is shown in Figure B below
(Image source: NVIDIA)
In comparison with the state-of-the-art global layout algorithm family ePlace/ReplAce, DREAMPlace achieved over 30 times acceleration in global layout and rationalization, without any quality degradation compared to theoretical and industrial benchmarks. More specifically, it can complete the design of one million units in just one minute. NVIDIA explored different implementations of low-level operators for forward and backward propagation (forward propagation to calculate targets, backward propagation to calculate gradients) to improve overall efficiency.
Tables 2 and 3 above show the HPWL and runtime details for ISPD 2005 and industrial benchmarks, respectively. Table 2 shows that under almost the same solution quality (with an average difference of 0.2%), DREAMPlace can achieve 35 times and 43 times acceleration compared to RePlAce's 40 threads on two benchmark suites of GP.
In addition, DREAMPlace is highly scalable and can merge new algorithms/solvers and new targets by simply writing high-level programming languages such as Python, with industrial designs reaching up to 10 million units. Nvidia plans to further investigate the net weighting of unit expansion's routability and time optimization, as well as the detailed layout of GPU acceleration. It can also be extended to utilize multi GPU platforms for further acceleration. Due to the separation of high-level algorithm design symbols and low-level acceleration work, DREAMPlace significantly reduces development and maintenance costs. This work by NVIDIA will open up new directions for re examining classic EDA problems.
However, due to its limited focus on line length and density, the layout quality of DREAMPlace cannot be compared to commercial tools, making it difficult to apply to industrial design processes. To address this issue, NVIDIA scientists recently proposed a new method in a research article called DREM-GAN, which is a layout optimization framework that uses generative adversarial learning to advance DREAMplaces. The biggest advantage of DREAM-GAN is that it enables DREAMPlace to optimize the underlying position towards tool validation (and optimization) without explicitly understanding the black box algorithm of commercial tools. DREAMPlace has improved its commercial position by optimizing the output of the discriminator.
DREAM-GAN
The experiment showed that DREAM GAN not only improved the main PPA indicators immediately during the placement phase, but also demonstrated that these improvements continued until the post routing phase, where the wireless length increased by 8.3% and the total power increased by 7.4%.
Detailed PPA comparison results of DREAMPlace and DREAM-GAN in each major PD stage. In this work, we use Synopsys ICC2 to perform the entire PD, except for global placement performed by DREAMPlace (left column) or DREAM-GAN (right column). In all commercial and OpenCore benchmark tests, the runtime difference between the two methods on the global layout does not exceed 2 minutes.
In addition, Nvidia scientists recently presented research papers on AutoDMP at the International Physical Design Symposium. AutoDMP was evaluated using the macro layout benchmark from the TILOS AI Institute, which includes CPU and AI accelerator designs with a large number of macros. For evaluation purposes, AutoDMP is integrated with commercial EDA tools, as shown in the following figure. Firstly, run multi-objective Bayesian optimization on the NVIDIA DGX system. The system has 4 A100 GPUs, each equipped with 80Gb of HBM memory. Generate 16 parallel processes to sample parameters and run DREAMPlace during optimization. Then, the macro location selected from the Pareto frontend is provided to the EDA Flow provided by TILOS running on the CPU server. In most designs, the PPA metrics of AutoDMP - line length, power, worst negative margin (WNS), and total negative margin (TNS) - are equal to or better than commercial processes.
The calculation process of AutoDMP
The development of technology is a double-edged sword. On the one hand, artificial intelligence can discover patterns by learning from existing chip design data and provide faster and more accurate chip design solutions by analyzing the data. Meanwhile, artificial intelligence technology can also improve the efficiency of chip design, shorten development time, and reduce costs.
On the other hand, with the continuous development and maturity of AI technology, some low-level and ordinary jobs that are not as creative may be replaced by artificial intelligence. As mentioned earlier, in the process of chip layout, it was mainly done manually in the past. Although current AI technology still has some limitations, with continuous improvement and breakthroughs in technology, the need for manual work in the chip design process will be reduced to a greater or lesser extent. Although this improves overall efficiency, it may end the jobs of some engineers.
But we don't have to worry either. Looking back at the four industrial revolutions, each one replaced some jobs, workers, or engineers, but also created new types of engineers, ultimately improving our productivity. Returning to the chip design industry, AI intervention will not completely replace manual labor, as in reality, the industry still requires individuals who can accurately verify and utilize AI tools and algorithms in the design process. The profound impact of this development on talent is to enhance the value of IC designers in the industry, freeing up more time for them to focus on more complex and creative design aspects, and ultimately producing better products.
The use of artificial intelligence technology to assist in the design and manufacturing of chips has become a trend. Not only Google and Nvidia, EDA software tool providers such as Synopsys, Siemens, and Cadence have also incorporated AI technology into their latest tools, while Samsung has introduced AI technology into chip manufacturing and more. The introduction of these AI/ML technology methods will provide new directions for advancing the layout of ultra large scale integrated circuits, and will also become one of the potential avenues for Moore's Law to run for a few more years.
【1】、《A graph placement methodology for fast chip design》
【2】、《DREAMPlace: Deep Learning Toolkit-Enabled GPU Acceleration for Modern VLSI Placement》
【3】、《DREAM-GAN: Advancing DREAMPlace towards Commercial-Quality using Generative Adversarial Learning》