next up previous
Next: Conclusions Up: Results Previous: Symmetric Problem

Non-Symmetric Problem

Figure 5 shows what happens to scalability when the total problem size is fixed and the number of processors is varied using PETSc on IA32. Ideal scaling would look like $\frac{1}{P}$. Total problem size is fixed at 8 million unknowns and $P$ ranges from 1 to 256. Again, BiCGStab preconditioned with Block Jacobi is the best choice.

Figure 6 shows a cross-platform comparison between the IA32 cluster and the Origin2000 with fixed total problem size scaling. The much smaller theoretical peak flop rate of the Origin2000 is most evident in the single processor case and is less pronounced as $P$ increases.

Figure 7 is the non-symmetric version of figure 3. Since this is a non-symmetric problem, the best non-multigrid method is now BiCGStab preconditioned with block Jacobi. Again it is evident that the multigrid preconditioned method scales far better than the others because of the superior algorithmic scaling of multigrid.

In Figure 8 the Linux cluster shows better single-processor performance compared with SGI Origin2000 (about 15 to 20% faster). The Linux cluster scales very well as the number of processors increases, while the SGI Origin2000 does not scale as well. Comparing the two numerical techniques, GMRES with multi-grid preconditioner show better single-processor performance. However, multigrid scales better. Therefore as number of processors increases, multigrid becomes more effective.

Table 1 demonstrates the scalability of different components of the solver using information from PETSc's built-in instrumentation. The tests were run on a non-symmetric problem using BiCGStab preconditioned with block Jacobi on IA32. Using the log files generated by PETSc, we have tabulated the percent of time spent in subset of the function calls made during a linear solve. PETSc inserts a barrier call in the dot-product function when profiling is turned on. The VecDotBarrier times reflect the synchronization delay in the dot-products. The MatMult scales well with the total solve time, but the VecDotBarrier does not. This indicates that as the number of processors increases, the VecDotBarrier is increasingly becoming the bottleneck for good performance and scaling. On the other hand, the MatSolve portion of the solve generally decreases as the number of processors grows due to the almost embarassingly parallel nature of the solves on the individual blocks of the block Jacobi preconditioner.

Figure 1: Fixed problem size ($\frac{1}{4}$ million unknowns) per processor with a symmetric problem solved by PETSc with two different coarse grid solvers.
\includegraphics[angle=270,width=11.5cm]{fig1}

Figure 2: Fixed problem size ($\frac{1}{4}$ million unknowns) per processor with a symmetric problem solved by PETSc's standard preconditioned Krylov Subspace methods.
\includegraphics[angle=270,width=11.5cm]{fig2}

Figure 3: Fixed problem size ($\frac{1}{4}$ million unknowns) per processor with a symmetric problem solved by PETSc, comparing the multigrid preconditoner with a standard one.
\includegraphics[angle=270,width=11.5cm]{fig3}

Figure 4: Cross-platform comparison with hypre's GMRES/MG and multigrid alone on a symmetric problem with fixed problem size ($\frac{1}{4}$ million unknowns) per processor.
\includegraphics[angle=270,width=11.5cm]{fig4}

Figure 5: Fixed total problem size (8 million unknowns) on a non-symmetric problem solved by PETSc using standard preconditioned Krylov Subspace methods.
\includegraphics[angle=270,width=11.5cm]{fig5}

Figure 6: Fixed total problem size (8 million unknowns) on a non-symmetric problem solved by PETSc. Cross-platform comparison of BiCGStab/block Jacobi.
\includegraphics[angle=270,width=11.5cm]{fig6}

Figure 7: Fixed problem size ($\frac{1}{4}$ million unknowns) per processor with a non-symmetric problem solved by PETSc, comparing the multigrid preconditoner with a standard one.
\includegraphics[angle=270,width=11.5cm]{fig7}

Figure 8: Cross-platform comparison with hypre's GMRES/MG and multigrid alone on a non-symmetric problem with fixed problem size ($\frac{1}{4}$ million unknowns) per processor.
\includegraphics[angle=270,width=11.5cm]{fig8}


Table 1: Percent of total solve time spent in selected function calls during PETSc solve using BiCGStab with block Jacobi preconditioning on a non-symmetric problem with fixed problem size (8 million unknowns) on IA32.
  16 processors 32 processors 64 processors 128 processors
VecDotBarrier 16.13 19.46 26.33 32.57
VecDot 4.15 4.82 7.39 12.09
MatMult 50.9 52.63 56.0 50.64
VecAXPY 5.3 3.64 3.22 2.82
MatSolve 26.7 28.36 21.95 15.9



next up previous
Next: Conclusions Up: Results Previous: Symmetric Problem
John Fettig 2002-09-13