To simplify matters, let's assume that M scalar operations and M^2 vector operations are required. Then the total number of operations performed is
In[1]:=
Clear[m] n = m + m^2
Out[1]=
2
m + m
One vector processor of a Cray YMP C90 can perform two floating point adds per clock cycle and each clock cycle is 4 nanoseconds. So its vector speed, in MFLOPS, is
In[2]:=
v = N[2 / (4 10^-9) * 10^-6]
Out[2]=
500.
One processor operating in scalar mode can perform two floating point adds in 6 clock cycles, so its scalar speed, in MFLOPS, is
In[3]:=
s = N[2 / (6 * 4 10^-9) * 10^-6]
Out[3]=
83.3333
The time required to complete the algorithm is
In[4]:=
t = m^2/v + m/s
Out[4]=
2
0.012 m + 0.002 m
The total performance, in MFLOPS, is
In[5]:=
Clear[r]
r[m_,v_,s_] = Simplify[n/t]
Out[5]=
500. (1 + m)
------------
6. + m
The fraction of operations performed in vector mode is
In[6]:=
Clear[f]
f[m_] = m/(1. + m)
Out[6]=
m
------
1. + m
We can construct a table for various values of m.
In[7]:=
mvalues = {0,5,10,20,40,80,160,320};
In[8]:=
Prepend[
Map[{#, r[#,v,s], f[#]}&, mvalues],
{"m", "r", "f"}]//TableForm
Out[8]=
m r f 0 83.3333 0 5 272.727 0.833333 10 343.75 0.909091 20 403.846 0.952381 40 445.652 0.97561 80 470.93 0.987654 160 484.94 0.993789 320 492.331 0.996885
and we can graph r as a function of m.
In[9]:=
Plot[r[m,v,s], {m,0,320}];

Notice that with a small grid of only 10x10 points, the fraction of operations in vector mode is 0.91. With a 40x40 grid the fraction is 0.98. This a much more encouraging picture than Amdahl's Law suggested.
In summary, Amdahl's Law tells us that we get miserable performance unless most of the operations are vectorization. Scalable Amdahl's Law tells us that for certain types of problems, most of the operations are vectorizable, so that performance is near optimal.
Go up to Numerical Methods for Supercomputers