Presentation¶

Let $\Omega_h$ be a triangular (2d) or tetrahedral (3d) mesh of $\Omega\subset\R^d$ corresponding to the following data structure:

$\mbox{\begin{tabular}{lccll} \hline \textbf{name} & \textbf{type} & \textbf{dimension} & \textbf{description}\\ \hline $\nq$ & integer & 1 & number of vertices\\ $\nme$ & integer & 1 & number of elements\\ $\q$ & double & $d \times \nq$ & \begin{minipage}[t]{7.9cm} array of vertices coordinates. $\q(\nu,j)$ is the $\nu$-th coordinate of the $j$-th vertex, $\nu\in\{1,\hdots,d\}$, $j\in\{1,\hdots,\rm{n_q}\}.$ The $j$-th vertex will be also denoted by $\rm{q}^j$ \end{minipage}\\ $\me$ & integer & $(d+1) \times \nme$ & \begin{minipage}[t]{7.9cm} connectivity array. $\me(\beta,k)$ is the storage index of the $\beta$-th vertex of the $k$-th element, in the array~$q$, for $\beta\in\{1,\hdots,(d+1)\}$ and $k\in\{1,\hdots,{\nme}\}$ \end{minipage}\\ $\rm volumes$ & double & $1\times {\nme}$ & \begin{minipage}[t]{7.9cm} array of volumes in 3d or areas in 2d. ${\rm volumes}(k)$ is the $k$-th element volume (3d) or element area (2d), $k\in\{1,\hdots,{\nme}\}$ \end{minipage}\\ \hline \end{tabular}}$

Scalar finite elements :

Let $\HUnHD{\DOMH}=\{v \in {\cal C}^0(\overline \DOMH),\ \ {v}_{|T} \in {\cal P}_1(T), \ \forall T \in {\cal T}_h \},$ be the finite dimensional space spanned by the ${\cal P}_1$ -Lagrange (scalar) basis functions $\{\varphi_i\}_{i\in\ENS{1}{\nq}}$ where ${\cal P}_1(T)$ denotes the space of all polynomials defined over T of total degree less than or equal to $1.$ The functions $\varphi_i$ satisfy

$\varphi_i(\rm{q}^j)=\delta_{i,j},\ \ \forall (i,j)\in\{1,\hdots,\rm{n_q}\}^2$

Then we have $\varphi_i\equiv 0$ on $T_k$ , $\forall k\in\{1,\hdots,{\nme}\}$ such that $\q^i\notin T_k$ .

A ${\cal P}_1$ -Lagrange (scalar) finite element matrix is of the generic form

$\MAT{D}_{i,j}=\int_\DOMH \mathcal{D}(\varphi_j,\varphi_i) d\DOMH,\ \ \forall (i,j)\in\ENS{1}{\nq}^2$

where $\mathcal{D}$ is the bilinear differential operator (order $1$ ) defined for all $u,v\in \HUnHD{\DOMH}$ by

$\mathcal{D}(u,v)=\DOT{\MAT{A}\GRAD u}{\GRAD v} -\left(u\DOT{\vecb{b}}{\GRAD v} - v \DOT{\GRAD u}{\vecb{c}}\right) +d uv$

and $\MAT{A}\in (\LInfD{\DOMH})^{d\times d},$ $\vecb{b}\in(\LInfD{\DOMH})^{d},$ $\vecb{c}\in(\LInfD{\DOMH})^{d}$ and $d\in\LInfD{\DOMH}$ are given functions on $\DOMH.$

We can notice that $\MAT{D}$ is a sparse matrix due to support properties of $\varphi_i$ functions.

Let $\ndfe=d+1.$ The element matrix $\MAT{D}_{\il,\jl}(T_k)$ , associated to $\MAT{D}$ , is the $\ndfe\times\ndfe$ matrix defined by

$\MAT{D}_{\il,\jl}(T_k)=\int_{T_k} \mathcal{D}(\varphi_{\jl}^k,\varphi_{\il}^k)(\q) d\q,\ \ \forall (\il,\jl)\in\{1,\hdots,\ndfe\}^2$

where $\varphi_{\il}^k=\varphi_{\me(\il,k)}$ is the $\il$ -th local basis function associated to the $k$ -th element.

For examples,
- the Mass matrix is defined by
  
  $\MAT{M}_{i,j}=\int_\DOMH \BasisFunc_j \BasisFunc_i d\DOMH$
  
  The corresponding bilinear differential operator $\mathcal{D}$ is completely defined with $\MAT{A}=0$ , $\vecb{b}=\vecb{c}=0$ and $d=1.$
- the Stiffness matrix is defined by
  
  $\MAT{S}_{i,j}=\int_\DOMH \DOT{\GRAD\BasisFunc_j}{\GRAD \BasisFunc_i} d\DOMH$
  
  The corresponding bilinear differential operator $\mathcal{D}$ is completely defined with $\MAT{A}=\MAT{I}$ , $\vecb{b}=\vecb{c}=0$ and $d=0.$
Vector finite elements :

The dimension of the space $(\HUnHD{\DOMH})^d$ is $\ndf=d\,\nq.$
- in dimension $d=2,$ we have two natural basis :
  
  the global alternate basis $\mathcal{B}_a$ defined by
  
  $\mathcal{B}_a=\{\FoncBaseDeuxD_1,\hdots,\FoncBaseDeuxD_{2\rm{n_q}}\}=\left\{ \begin{pmatrix} \varphi_1 \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ \varphi_1 \end{pmatrix}, \begin{pmatrix} \varphi_2 \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ \varphi_2\end{pmatrix}, \hdots, \begin{pmatrix} \varphi_{\rm n_q} \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ \varphi_{\rm n_q} \end{pmatrix} \right\}$
  
  and the global block basis $\mathcal{B}_b$ defined by
  
  $\mathcal{B}_b=\{\FoncBaseDeuxDB_1,\hdots,\FoncBaseDeuxDB_{2\rm{n_q}}\}=\left\{ \begin{pmatrix} \varphi_1 \\ 0 \end{pmatrix}, \begin{pmatrix} \varphi_2 \\ 0 \end{pmatrix},\hdots, \begin{pmatrix} \varphi_{\rm n_q} \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ \varphi_1 \end{pmatrix}, \begin{pmatrix} 0 \\ \varphi_2\end{pmatrix},\hdots, \begin{pmatrix} 0 \\ \varphi_{\rm n_q} \end{pmatrix} \right\}.$
- in dimension $d=3,$ we have two natural basis :
  
  the global alternate basis $\mathcal{B}_a$ defined by
  
  $\mathcal{B}_a=\{\FoncBaseDeuxD_1,\hdots,\FoncBaseDeuxD_{3\nq}\}=\left\{ \begin{pmatrix} \varphi_1 \\ 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ \varphi_1 \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ 0 \\ \varphi_1 \end{pmatrix}, \hdots, \begin{pmatrix} \varphi_\nq \\ 0 \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ \varphi_\nq \\ 0 \end{pmatrix}, \begin{pmatrix} 0 \\ 0 \\ \varphi_\nq \end{pmatrix} \right\}$
  
  and the global block basis $\mathcal{B}_b$ defined by
  
  $\mathcal{B}_b=\{\FoncBaseDeuxDB_1,\hdots,\FoncBaseDeuxDB_{3\nq}\}=\left\{ \begin{pmatrix} \varphi_1 \\ 0 \\ 0\end{pmatrix},\hdots, \begin{pmatrix} \varphi_\nq \\ 0 \\0\end{pmatrix}, \begin{pmatrix} 0 \\ \varphi_1 \\ 0\end{pmatrix},\hdots, \begin{pmatrix} 0 \\ \varphi_\nq \\0 \end{pmatrix}, \begin{pmatrix} 0 \\ 0\\ \varphi_1 \end{pmatrix},\hdots, \begin{pmatrix} 0 \\ 0\\ \varphi_\nq \end{pmatrix} \right\}.$
A ${\cal P}_1$ -Lagrange (vector) finite element matrix in $\mathcal{B}_a$ basis is of the generic form

$\MAT{H}_{l,m}=\int_\DOMH \mathcal{H}(\BasisFuncTwoD_m,\BasisFuncTwoD_l) d\DOMH,\ \ \forall (l,m)\in\ENS{1}{d\nq}^2$

where $\mathcal{H}$ is the bilinear differential operator (order $1$ ) defined by

$\mathcal{H}(\vecb{u},\vecb{v})=\sum_{\il,\jl=1}^d \mathcal{D}^{\il,\jl}(u_\il,v_\jl)$

and $\mathcal{D}^{\il,\jl}$ is a bilinear differential operator of scalar type.

For example, in dimension $d=2,$ the Elastic Stiffness matrix is defined by

$\mathbb{K}_{m,l}=\int_{\Omega_h} \underline{\pmb{\epsilon}}^t(\psi_l(q)) \MAT{C} \underline{\pmb{\epsilon}}(\psi_m(q)) , \ \ \forall (m,l) \in \{1,...,2\rm{n_q}\}^2.$

where $\underline{\pmb{\epsilon}}=(\epsilon_{xx},\epsilon_{yy},\epsilon_{xy})^t$ is the strain tensor respectively and $\MAT{C}$ is the Hooke matrix

$\MAT{C}=\begin{pmatrix} \lambda+2\mu & \lambda & 0\\ \lambda & \lambda+2\mu & 0\\ 0 & 0 & \mu \end{pmatrix}$

Then the bilinear differential operator associated to this matrix is given by

$\begin{array}{lcl} \mathcal{H}(\vecb{u},\vecb{v})&=& \DOT{\begin{pmatrix} \lambda+2\mu & 0\\ 0 & \mu \end{pmatrix}\GRAD u_1 }{\GRAD v_1} +\DOT{\begin{pmatrix} 0 & \mu\\ \lambda & 0 \end{pmatrix}\GRAD u_2 }{\GRAD v_1}\\ &+&\DOT{\begin{pmatrix} 0 & \lambda\\ \mu & 0 \end{pmatrix}\GRAD u_1 }{\GRAD v_2}+ \DOT{\begin{pmatrix} \mu & 0\\ 0 &\lambda+2\mu\end{pmatrix}\GRAD u_2 }{\GRAD v_2} \end{array}$

We present now three algorithms (base, OptV1 and OptV2 versions) for assembling this kind of matrix.

Note

These algorithms can be successfully implemented in various interpreted languages under some assumptions. For all versions, it must have a sparse matrix implementation. For OptV1 and OptV2 versions, we also need a particular sparse matrix constructor (see Sparse matrix requirement). And, finally, OptV2 also required that the interpreted languages support classical vectorization operations. Here is the current list of interpreted languages for which we have successfully implemented these three algorithms :

Python with Numpy and Scipy modules,

Matlab,

Octave,

Scilab

Classical assembly algorithm (`base` version)¶

Due to support properties of $P_1$ -Lagrange basis functions, we have the classical algorithm :

Note

We recall the classical matrix assembly in dimension $d$ with $\ndfe=d+1.$

Figure 43: Classical matrix assembly in 2d or 3d

In fact, for each element we add its element matrix to the global sparse matrix (lines 4 to 10 of the previous algorithm). This operation is illustrated in the following figure in 2d scalar fields case :

Figure 44: Adding of an element matrix to global matrix in 2d scalar fields case

The repetition of elements insertion in sparse matrix is very expensive.

Sparse matrix requirement¶

The interpreted language must contain a function to generate a sparse matrix M from three 1d arrays of same length Ig, Jg and Kg such that M(Ig(k),Jg(k))=Kg(k) . Furthermore, the elements of Kg having the same indices in Ig and Jg must be summed.

We give for several interpreted languages the corresponding function :

Python (scipy.sparse module) : M=sparse.<format>_matrix(Kg,(Ig,Jg),shape=(m,n) where <format> is the sparse matrix format chosen in csc , csr , lil ,...
Matlab : M=sparse(Ig,Jg,Kg,m,n), only csc format.
Octave : M=sparse(Ig,Jg,Kg,m,n), only csc format.
Scilab : M=sparse([Ig,Jg],Kg,[m,n]), only row-by-row format.

Obviously, this kind of function exists in compiled languages. For example, in C language, one can use the SuiteSparse from T. Davis and with Nvidia GPU, one can use Thrust library.

Optimized classical assembly algorithm (`OptV1` version)¶

The idea is to create three global 1d-arrays $\vecb{I}_g,$ $\vecb{J}_g$ and $\vecb{K}_g$ allowing the storage of the element matrices as well as the position of their elements in the global matrix. The length of each array is $\ndfes \nme.$ (i.e. $9 \nme$ for $d=2$ and $16 \nme$ for $d=3$ ). Once these arrays are created, the matrix assembly is obtained with one of the previous commands.

To create these three arrays, we first define three local 1d-arrays $\vecb{K}^e_k,$ $\vecb{I}^e_k$ and $\vecb{J}^e_k$ of $\ndfes$ elements obtained from a generic element matrix $\MAT{E}(T_k)$ of dimension $\ndfe$ :

From these arrays, it is then possible to build the three global arrays $\vecb{I}_g,$ $\vecb{J}_g$ and $\vecb{K}_g,$ of size $\ndfes\nme \times 1$ defined, $\forall k \in \ENS{1}{\nme},$ $\forall l \in\ENS{1}{\ndfes},$ by

$\begin{array}{lcl} \vecb{K}(\ndfes(k-1)+l)&=&\vecb{K}^e_k(l),\\ \vecb{I}(\ndfes(k-1)+l)&=&\vecb{I}^e_k(l)=\mathcal{I}^k((l-1) \bmod m +1),\\ \vecb{J}(\ndfes(k-1)+l)&=&\vecb{J}^e_k(l)=\mathcal{I}^k(E((l-1)/ m)+1). \end{array}$

So, for each triangle $T_k$ , we have

Then, a simple algorithm can build these three arrays using a loop over each triangle.

Note

We give the complete OptV1 algorithm

New Optimized assembly algorithm (`OptV2` version)¶

We present the optimized version 2 algorithm where no loop is used. We define three 2d-arrays that allow to store all the element matrices as well as their positions in the global matrix. We denote by $\MAT{K}_g,$ $\MAT{I}_g$ and $\MAT{J}_g$ these $\ndfes-by-\nme$ arrays, defined $\forall k \in \{1,\hdots,\nme\},$ $\forall il \in\{1,\hdots,\ndfes\}$ by

(1) $\MAT{K}_g(il,k)=\vecb{K}^e_k(il),\quad\quad \MAT{I}_g(il,k)=\vecb{I}^e_k(il),\quad\quad \MAT{J}_g(il,k)=\vecb{J}^e_k(il).$

The three local arrays $\vecb{K}^e_k,$ $\vecb{I}^e_k$ and $\vecb{J}^e_k$ are thus stored in the $k^{th}$ column of the global arrays $\MAT{K}_g,$ $\MAT{I}_g$ and $\MAT{J}_g$ respectively.

A natural way to build these three arrays consists in using a loop through the triangles $T_k$ in which we insert the local arrays column-wise

Figure 48: Construction of $\mathbb{K}_g$ , $\mathbb{I}_g$ and $\mathbb{J}_g$ with loop

The natural construction of these three arrays is done column-wise. So, for each array there are $\nme$ columns to compute, which depends on the mesh size. To vectorize, we must fill these arrays by row-wise operations and then for each array there are $\ndfes$ rows to compute. We recall that $\ndfe$ does not depend on the mesh size. These rows insertions are represented in Figure 49 . We can also remark that, $\forall (\il,\jl)\in\ENS{1}{\ndfe}$ , with $m=\ndfe,$

$\MAT{K}_g(m(\jl-1)+\il,k)=e^k_{\il,\jl},\ \ \MAT{I}_g(m(\jl-1)+\il,k)=\mathcal{I}^k_\il,\ \ \MAT{J}_g(m(\jl-1)+\il,k)=\mathcal{I}^k_\jl.$

where, in scalar fields case, $\mathcal{I}^k_\il=\me(\il,k),$ $\forall \il\in\ENS{1}{d+1}$ and in vector fields case, $\mathcal{I}^k_{d a - b}=d \me(a,k)-b,$ $\forall a \in \ENS{1}{d+1},$ $\forall b \in \ENS{0}{d-1}.$

Figure 49: Row-wise operations on global arrays $\mathbb{I}_g,$ $\mathbb{J}_g$ and $\mathbb{K}_g$

As we can see in Figures 50 and 51, it is quite easy to vectorize $\MAT{I}_g$ and $\MAT{J}_g$ computations in scalar fields case by filling these arrays lines by lines :

Figure 50: Construction of $\mathbb{I}_g$ and $\mathbb{J}_g$ : OptV2 version for 2d scalar matrix

Figure 51: Construction of $\mathbb{I}_g$ and $\mathbb{J}_g$ : OptV2 version for 3d scalar matrix

Using vectorization tools, we can compute $\MAT{I}_g$ and $\MAT{J}_g$ in one line. The vectorized algorithms in 2d and 3d scalar fields are represented by Figure 52.

Figure 52: Vectorized algorithms for $\mathbb{I}_g$ and $\mathbb{J}_g$ computations

In vector fields case, we construct the tabular $\MAT{T}$ such that $\MAT{T}(\alpha,k)=\mathcal{I}^k_\il,$ $\forall \il\in\ENS{1}{\ndfe}.$ Then we can vectorize $\MAT{I}_g$ and $\MAT{J}_g$ computations in vector fields case. We represent in Figure 53 these operations in 2d.

Figure 53: Construction of $\mathbb{I}_g$ and $\mathbb{J}_g$ : OptV2 version in 2d vector fields case

Now, it remains to vectorize the computation of $\mathbb{K}_g$ array which contains all the element matrices associated to $\MAT{D}$ or $\MAT{H}$ : it should be done by row-wise vector operations.

We now describe a vectorized construction of $\mathbb{K}_g$ array associated to a generic bilinear form $a$ . For the mesh element $T_k$ , the element matrix is then given by

$\MAT{E}(T_k)=\begin{pmatrix} a(\varphi_{i_1^k},\varphi_{i_1^k}) & a(\varphi_{i_1^k},\varphi_{i_2^k}) & a(\varphi_{i_1^k},\varphi_{i_3^k})\\ a(\varphi_{i_2^k},\varphi_{i_1^k}) & a(\varphi_{i_2^k},\varphi_{i_2^k}) & a(\varphi_{i_2^k},\varphi_{i_3^k})\\ a(\varphi_{i_3^k},\varphi_{i_1^k}) & a(\varphi_{i_3^k},\varphi_{i_2^k}) & a(\varphi_{i_3^k},\varphi_{i_3^k}) \end{pmatrix}$

Figure 54: Construction of $\mathbb{K}_g$ for Mass matrix

Figure 55: Construction of $\mathbb{K}_g$ for Mass matrix

Presentation¶

Classical assembly algorithm (`base` version)¶

Sparse matrix requirement¶

Optimized classical assembly algorithm (`OptV1` version)¶

New Optimized assembly algorithm (`OptV2` version)¶

Table Of Contents

Previous topic

Next topic

This Page

Navigation

Presentation¶

Classical assembly algorithm (base version)¶

Sparse matrix requirement¶

Optimized classical assembly algorithm (OptV1 version)¶

New Optimized assembly algorithm (OptV2 version)¶

Table Of Contents

Previous topic

Next topic

This Page

Quick search

Navigation

Classical assembly algorithm (`base` version)¶

Optimized classical assembly algorithm (`OptV1` version)¶

New Optimized assembly algorithm (`OptV2` version)¶