Ingenieurgesellschaft für
technische Software

Performance Aspects

By ongoing further developments of the equation solvers PERMAS achieves a very high computation speed. Both, direct and iterative solvers, are continuously optimized.

Basic properties
  • Very good multitasking behavior due to a high degree of computer utilization and a low demand for central memory.
  • The central memory size used can be freely configured - without any limitation on the model size.
  • The disk space used can be split on several disks - without any logical partitioning (e.g. optimum disk utilization in a workstation network).
  • There are practically no limits on the model size and no explicit limits exist within the software. Even models with many million degrees of freedom can be handled.
  • By using well-established libraries like BLAS for matrix and vector operations, PERMAS is adapted to the specific characteristics of hardware platforms and thus provides a very high efficiency.
  • Another increase of computing power has been achieved by an overall parallelization of the software. See also XPU
  • By simultaneous use of several disks (so-called disk striping) the I/O performance can be raised beyond the characteristics of the single disks. Direct I/O is available for NVME technology
Performance Aspects


PERMAS is also fully available for parallel computers. A general parallelization approach allows the parallel processing of all time-critical operations without being limited to equation solvers. There is only one software version for both sequential and parallel computers. Finite element analysis is a "classical" field of high performance computing.

PERMAS supports the parallelization on shared memory computers. There, the parallelization is based on POSIX Threads, i.e. PERMAS is executed in several parallel processes, which all use the same memory area. This avoids additional communication between the processors, which fully corresponds with the overall architecture of such systems.

  • On shared memory computers the parallelization is based on POSIX Threads, i.e. PERMAS is executed in several parallel processes, which all use the same memory area. This avoids additional communication between the processors, which fully corresponds with the overall architecture of such systems.

In addition, PERMAS allows asynchronous I/O, which realizes better performance by overlapping CPU and I/O times. Moreover, a Nvidia GPU may be used. See also INTEL.

Parallelization does not change the sequence of numerical operations in PERMAS, i.e. the results of a sequential analysis and a parallel analysis of the same model on the same machine are identical (if all other parameters remain unchanged).

PERMAS is able to work with constant and pre-fixed memory for each analysis. This also holds for a parallel execution of PERMAS. So, several simultaneous sequential jobs as well as several simultaneous parallel jobs or any mix of sequential and parallel jobs are possible.

The parallelization is based on a mathematical approach, which allows the automatic parallelization of sequentially programmed software. So, PERMAS remains generally portable and the main goal has been achieved: One single PERMAS version for all platforms.

Parallel PERMAS is available for all UNIX/Windows platforms, where a sequential version is supported, too.

Due to the development of faster CPUs and higher I/O speeds in the recent years, the gap to the network speeds has become larger. So, on distributed memory machines acceptable speed-ups using parallelization are more difficult to achieve. Consequently, for the time being shared memory architectures show much better speed-ups with PERMAS.

The parallel execution of PERMAS is very simple. Because there are no special commands necessary, a sequential run of PERMAS does not differ from a parallel one - except for the shorter run time. Only the number of parallel processes or processors for the PERMAS run has to be defined in advance.

Parallel Performance Parallel Performance
Performance Aspects

Eigenvalues, MLDR

PERMAS-XPU - GPU Accelerator

PERMAS supports NVIDIA Tesla Cards. Since 1996 PERMAS uses a unique parallelization concept with a run-time parallelization of all matrix operations based on a dynamically generated task-graph of hierarchical block-operations. This concept gives excellent speedups especially on shared memory machines and ensures bit-identical results independent of the number of cores or the amount of memory used.

During the German MCSimVis and the European H4H project from 2009-2015 this concept was extended by a seamless integration of NVIDIA Cards. An NVIDIA Card may be used as an additional floating-point accelerator just like plugging in an extra socket of extra CPU cores.

The collaborative work of all CPUs plus the GPU acceleration is available for any PERMAS analysis and is not restricted by any hardware resource. I.e. PERMAS is known for solving huge FEM simulation problems even on limited hardware resources. E.g. efficiently working with TByte matrices on a system with only some GByte memory is not a problem for PERMAS. This is supported by asynchronous handling of I/O and computations.

Thus the extra speedup from NVIDIA Tesla Cards can be seen even for out-of-core simulations involving PBytes of local I/O. Typically, on standard single or multi socket compute servers, an extra Tesla Card boosts the PERMAS performance by another factor 2 to 4, as shown for a large contact analysis that shows an overall speedup of 1.8 for the whole job.

Contact Simulation