Scalable Amdahl's Law

For some classes of problems, the fraction of the operations that are vectorizable increases as the problem size increases. For example, consider a problem on an MxM grid. Suppose some preliminary scalar operations must be performed on the boundary nodes, requiring O(M) operations. Then vector calculations are performed on the interior nodes requiring O(M^2) operations.

To simplify matters, let's assume that M scalar operations and M^2 vector operations are required. Then the total number of operations performed is

In[1]:=


  Clear[m]
  n = m + m^2

Out[1]=

       2
  m + m

One vector processor of a Cray YMP C90 can perform two floating point adds per clock cycle and each clock cycle is 4 nanoseconds. So its vector speed, in MFLOPS, is

In[2]:=

  v = N[2 / (4 10^-9) * 10^-6]

Out[2]=

  500.

One processor operating in scalar mode can perform two floating point adds in 6 clock cycles, so its scalar speed, in MFLOPS, is

In[3]:=

  s = N[2 / (6 * 4 10^-9) * 10^-6]

Out[3]=

  83.3333

The time required to complete the algorithm is

In[4]:=

  t = m^2/v + m/s

Out[4]=

                   2
  0.012 m + 0.002 m

The total performance, in MFLOPS, is

In[5]:=

  Clear[r]
  r[m_,v_,s_] = Simplify[n/t]

Out[5]=

  500. (1 + m)
  ------------
     6. + m

The fraction of operations performed in vector mode is

In[6]:=

  Clear[f]
  f[m_] = m/(1. + m)

Out[6]=

    m
  ------
  1. + m

We can construct a table for various values of m.

In[7]:=

  mvalues = {0,5,10,20,40,80,160,320};

In[8]:=

  Prepend[
     Map[{#, r[#,v,s], f[#]}&, mvalues],
     {"m", "r", "f"}]//TableForm
  

Out[8]=

  m     r         f
  
  0     83.3333   0
  
  5     272.727   0.833333
  
  10    343.75    0.909091
  
  20    403.846   0.952381
  
  40    445.652   0.97561
  
  80    470.93    0.987654
  
  160   484.94    0.993789
  
  320   492.331   0.996885

and we can graph r as a function of m.

In[9]:=

  Plot[r[m,v,s], {m,0,320}];

Notice that with a small grid of only 10x10 points, the fraction of operations in vector mode is 0.91. With a 40x40 grid the fraction is 0.98. This a much more encouraging picture than Amdahl's Law suggested.

In summary, Amdahl's Law tells us that we get miserable performance unless most of the operations are vectorization. Scalable Amdahl's Law tells us that for certain types of problems, most of the operations are vectorizable, so that performance is near optimal.

Go up to Numerical Methods for Supercomputers