Sparse Matrix - SPICE仿真中的稀疏矩阵应用 (1)

Posted by Jing on November 12, 2020

SPICE仿真简介

读者可能还记得,他们花了几个小时在一个工程实验室里,煞费苦心地对电路进行测试:仔细地定位电阻和引线,在正确的位置施加电压,测量电流,并对设计进行调整,直到电路正确运行。

虽然该方法有助于测试相对简单的离散设计,但不能用于复杂的集成电路,因为它们的多层是通过光刻技术生成的。在这种情况下,研究人员转向电路仿真软件来测试他们的设计,然后再送到制造。

着重集成电路的模拟程序SPICE于70年代在伯克利创立,后来发展并引发了许多版本的开发,有些是开源的,有些是商业的。各种程序根据电路的布局和元件生成微分方程组,然后使用几种方法中的一种来求解方程组,从而“模拟”电路。

Why SPICE need Sparse Matrix?

Readers may remember spending hours in an engineering lab, painstakingly breadboarding a circuit to test a design: carefully positioning resistors and leads, applying voltage in the right place, measuring currents, and making adjustments to the design until the circuit behaved correctly. Although helpful for testing relatively simple discrete designs, the method can’t be used for complex integrated circuits, with their multiple layers generated by photo-lithography. In that case, researchers turn to circuit simulation software to test their designs before they are sent to be manufactured. SPICE—Simulation Program with Integrated Circuit Emphasis—was created at Berkeley in the 70s, and has since evolved and sparked the development of numerous versions, some open-source and others commercial. The various programs generate sets of differential equations based on the layout and components of a circuit, then use one of several methods to solve the equations, thereby “simulating” the circuit. One method used in many circuit simulation programs involves sparse matrix techniques. For small circuits, most solvers work very quickly, with little discernible difference in running times. As circuits get larger, however, the sparse matrix factorization algorithm used increasingly affects performance. 许多电路仿真程序中使用的一种方法涉及稀疏矩阵技术。对于小型电路,大多数解算器工作非常迅速,运行时间几乎没有明显差别。然而,随着电路规模的增大,稀疏矩阵分解算法的使用越来越影响性能。

Tim Davis, an associate professor of computer science at the University of Florida, has developed a high-performance algorithm for factoring matrices arising in circuit simulation. He presented some of the underlying background in an invited talk in Boston last summer at the SIAM Annual Meeting. The algorithm has been implemented in several versions of SPICE, both commercial and open-source. Davis calls the LU decomposition-based algorithm KLU—LU for its expression of matrices as products of lower and upper triangular matrices, K for Clark Kent. 佛罗里达大学计算机科学副教授Tim Davis开发了一种高性能的分解电路仿真矩阵的算法。去年夏天,他在波士顿暹罗年会的一次受邀演讲中介绍了一些基本背景。该算法已在多个SPICE版本中实现,包括商业版本和开源版本。戴维斯把基于LU分解的算法称为KLU-LU,因为它把矩阵表示为上下三角矩阵的乘积,K代表克拉克肯特。

The KLU algorithm, according to Davis, is “somewhere between Gilbert/Peierls’ left-looking LU and Li, Demmel, and Gilbert’s SuperLU.” As to Clark Kent, Davis thinks of KLU as “what SuperLU was before it became super.” 戴维斯认为,KLU算法是“介于吉尔伯特/佩尔斯的左撇子卢和李、德梅尔和吉尔伯特的SuperLU之间。”至于克拉克·肯特,戴维斯认为KLU是“超级之前的SuperLU”

The key step of the Gilbert/Peierls algorithm (GPLU) is the solution of Lx = b, where L, x, and b are sparse. In this step, the columns of L and U are computed, one at a time. GPLU does not specify a fill-reducing ordering, a choice that is crucial for circuit simulation matrices. SuperLU takes advantage of supernodes—collections of adjacent columns that have the same nonzero pattern. Circuit matrices are typically too sparse to allow efficient exploitation of supernodes, however, which means that the “super” nature of SuperLU is not very useful in circuit simulation. Gilbert/Peierls算法(GPLU)的关键步骤是求解Lx=b,其中L、x和b是稀疏的。在这一步中,L列和U列一次计算一列。GPLU没有指定填充减少顺序,这是电路仿真矩阵的关键选择。SuperLU利用了supernode——具有相同非零模式的相邻列的集合。然而,电路矩阵通常过于稀疏,无法有效利用超级节点,这意味着超级逻辑单元的“超级”特性在电路仿真中并不是很有用。

One notable implementation of KLU is in Xyce, Sandia National Laboratories’ version of SPICE. Created in 1999 to fill the need for a distributed-memory parallel circuit simulation tool at Sandia, Xyce was designed as a parallel code. KLU was first implemented in Xyce in 2004; on its official release in 2005, the software performed substantially better in certain cases. KLU builds on GPLU by using preordering strategies not included in GPLU or SuperLU, but avoids the supernode-based strategy of SuperLU. Very little fill-in occurs with the ordering used by KLU. The resulting algorithm has improved the speed of circuit simulation in many cases. KLU是在GPLU的基础上构建的,它使用GPLU或SuperLU中没有包含的预购策略,但是避免了SuperLU基于超级节点的策略。KLU使用的订单很少填充。该算法在许多情况下提高了电路仿真的速度。

“Xyce with KLU has been used to perform critical variability studies wherein hundreds of versions of Xyce are run, in serial on a parallel computer cluster,” says Scott Hutchinson, manager of electrical and microsystems modeling at Sandia. “The KLU solver’s improved performance . . . [has decreased] the linear solution times anywhere from 20% to an order of magnitude as the problem size increases.” KLU的一个显著的实现是Xyce,Sandia国家实验室的SPICE版本。Xyce于1999年创建,旨在满足Sandia对分布式内存并行电路仿真工具的需求,它被设计成一个并行代码。KLU于2004年首次在Xyce中实现;在2005年正式发布时,该软件在某些情况下表现得更好。

The algorithm could have a huge impact on overall simulation performance, Hutchinson explains, because of the order of operations performed during a linear solve. “With most linear problems, what makes them time consuming is that the number of operations . . . scales superlinearly with the number of unknowns.” The KLU solve, as “the �inner kernel’ of a nonlinear and time-integration solve, can get called millions of times. Thus, speed improvements at this level can have a big impact on the overall time to solution.” “Xyce和KLU已被用于执行关键的可变性研究,其中数百个版本的Xyce在并行计算机集群上串行运行,”Sandia电气和微系统建模经理Scott Hutchinson说KLU解算器的改进性能。随着问题规模的增大,[已将]线性求解时间从20%减少到一个数量级。”

哈钦森解释说,由于线性求解过程中执行的操作顺序不同,该算法可能会对整体模拟性能产生巨大影响。”对于大多数线性问题,使它们耗时的是运算的数量。KLU解作为非线性时间积分解的“内核”,可以被称为数百万次。因此,这一级别的速度提高会对解决问题的总体时间产生很大影响。”

KLU is a strong example of specialized improvement: Davis developed it expressly for circuit simulation, and the improvements over SuperLU are limited to circuits. KLU also improves on another factorization method, the Cholesky decomposition, in the case of circuit simulation: While Cholesky can be twice as efficient as KLU and other methods for solving linear equations, it typically fails in circuit simulation. KLU是专门改进的一个很好的例子:Davis专门为电路模拟开发了它,而对SuperLU的改进仅限于电路。KLU在电路仿真中还改进了另一种因子分解方法Cholesky分解:虽然Cholesky在求解线性方程方面的效率是KLU和其他方法的两倍,但它在电路模拟中通常失败。

“Cholesky is good only if the matrix is symmetric and positive-definite. It fails otherwise,” Davis says. “Circuit matrices are usually not symmetric.” “Cholesky只有在矩阵对称且正定的情况下才是好的。否则它就失败了,”戴维斯说电路矩阵通常不是对称的。”

Additionally, Davis points out, SuperLU and UMFPACK (another sparse linear system solver, which is used in MATLAB) are better for discretizations of, say, two- and three-dimensional differential equations, but inferior to KLU for circuits. 此外,Davis指出,SuperLU和UMFPACK(另一种稀疏线性系统求解器,在MATLAB中使用)更适合于离散二维和三维微分方程,但在电路方面不如KLU。

“Circuits do not have a 2D or 3D geometry; they look more like networks, not meshes,” he says, explaining KLU’s advantage for circuits. “电路没有二维或三维几何结构;它们看起来更像网络,而不是网格,”他解释道,解释KLU的电路优势。

Despite being created for circuit simulation, KLU may also have specialized applications elsewhere. Davis suggests that it might also be an improvement over SuperLU and other methods when used for chemical process simulation, where the very sparse, irregular network-like structure of the matrices involved is similar to that of circuit simulation matrices. 尽管KLU是为电路模拟而创建的,但它也可能在其他地方有专门的应用。Davis认为,当用于化学过程模拟时,它也可能是对SuperLU和其他方法的一个改进,在这些方法中,所涉及的矩阵的非常稀疏、不规则的网络状结构与电路模拟矩阵相似。

In the meantime, however, Davis is still working with commercial developers of other SPICE-type simulators to further improve KLU’s performance in circuit simulation. Hutchinson and his colleagues at Sandia currently use KLU only as a serial solver, but would like to expand its use for larger-scale problems. 与此同时,戴维斯仍在与其他SPICE类型模拟器的商业开发人员合作,以进一步提高KLU的电路模拟性能。哈钦森和他在Sandia的同事目前只使用KLU作为串行解算器,但希望将其用于更大规模的问题。

Currently, “at some problem size—say 100,000 devices—we switch to using parallel computers and iterative solvers” instead of KLU, Hutchinson says. “We are working with Tim on a parallel version of KLU to improve things further as the problems scale.” 哈钦森说,目前,“在某些问题上,比如说10万台设备,我们转而使用并行计算机和迭代求解器”,而不是KLU我们正在与Tim合作开发KLU的并行版本,以便随着问题的扩大而进一步改进。