By ongoing further developments of the equation solvers PERMAS achieves a very high computation speed. Both, direct and iterative solvers, are continuously optimized.
- Very good multitasking behavior due to a high degree of computer utilization and a low demand for central memory.
- The central memory size used can be freely configured - without any limitation on the model size.
- The disk space used can be split on several disks - without any logical partitioning (e.g. optimum disk utilization in a workstation network).
- There are practically no limits on the model size and no explicit limits exist within the software. Even models with many million degrees of freedom can be handled.
- By using well-established libraries like BLAS for matrix and vector operations, PERMAS is adapted to the specific characteristics of hardware platforms and thus provides a very high efficiency.
- Another increase of computing power has been achieved by an overall parallelization of the software.
- By simultaneous use of several disks (so-called disk striping) the I/O performance can be raised beyond the characteristics of the single disks.
PERMAS is also fully available for parallel computers. A general parallelization approach allows the parallel processing of all time-critical operations without being limited to equation solvers. There is only one software version for both sequential and parallel computers.
Because the principal architectures of available parallel computers are either shared memory or distributed memory, PERMAS offers two different parallelization strategies, too:
- On shared memory computers the parallelization is based on POSIX Threads, i.e. PERMAS is executed in several parallel processes, which all use the same memory area. This avoids additional communication between the processors, which fully corresponds with the overall architecture of such systems.
- On distributed memory computers the parallelization is based on MPI (Message Passing Interface), a standard for the control of communication between different processors. This tool is required for parallel use of different processors to solve a single analysis task.
In addition, PERMAS allows asynchronous I/O on both different architectures, which realizes better performance by overlapping CPU and I/O times.
Parallelization does not change the sequence of numerical operations in PERMAS, i.e. the results of a sequential analysis and a parallel analysis of the same model on the same machine are identical (if all other parameters remain unchanged).
PERMAS is able to work with constant and pre-fixed memory for each analysis. This also holds for a parallel execution of PERMAS. So, several simultaneous sequential jobs as well as several simultaneous parallel jobs or any mix of sequential and parallel jobs are possible.
The parallelization is based on a mathematical approach, which allows the automatic parallelization of sequentially programmed software. So, PERMAS remains generally portable and the main goal has been achieved: One single PERMAS version for all platforms.
Parallel PERMAS is available for all UNIX platforms, where a sequential version is supported, too.
Due to the development of faster CPUs and higher I/O speeds in the recent years, the gap to the network speeds has become larger. So, on distributed memory machines acceptable speed-ups using parallelization are more difficult to achieve. Consequently, for the time being shared memory architectures show much better speed-ups with PERMAS.
The parallel execution of PERMAS is very simple. Because there are no special commands necessary, a sequential run of PERMAS does not differ from a parallel one - except for the shorter run time. Only the number of parallel processes or processors for the PERMAS run has to be defined in advance.