[mesa-users] mesa/star with multicore linear algebra gives different results than with single core
paxton at kitp.ucsb.edu
Mon Jun 25 12:35:54 EDT 2012
Ok.. here is some info to help you understand what's going on.
I'm cc'ing to mesa-users since this is a question of general interest.
background for mesa-users:
the identical mesa/star problem run on the same machine, same compiler, etc.
but with different number of cores, gives differences in results. why?
when run star with serial linear algebra (e.g., block_thomas_dble),
the differences go away -- i.e., results are then independent of number of cores.
mesa/star makes use of multicores as much as possible.
for microphysics (eos, kap, net), it can do each zone in the star in parallel
and so can use as many cores as you have with very good efficiency
and no numerical effects (modulo bugs).
we do such a good job on the microphysics that the linear algebra becomes
the major time consumer. however when you breakup linear algebra, you
change the order of the numeric operations and because of limited
precision floating point, that will change the result. usually the change
is so small you don't care. but if the matrix is ill-conditioned (as ours
often are) the effect can be large enough to make you notice.
the block_thomas_dble linear algebra routine is serial -- so you
get no benefit from multi cores, so you get the exact same
answer independent of number of cores (NOTE: not the "right"
answer, just the same answer)
the block_dc_mt_dble routine is a parallel algorithm, so you
get a good benefit from multicores, but the answer will
depend on the number of cores since the order of
arithmetic operations will change and round off errors will
be different. (NOTE: this type of uncertainty is present
in most (all?) programs that make use of parallelism;
it is unusual that we can avoid it in the input physics,
and that is only because the separate cells are completely
independent at that level.)
Rather than give up on using the multiple cores for the
linear algebra (please don't!) I suggest that you compare
converged results for different number of cores to see
how big the effect is for your particular problem.
By "converged" result I mean what you get for a fixed
number of cores but with increasing resolution in
time and space -- more timesteps and more grid points.
Keep increasing the number of timesteps and number
of grid points until the results settle down (they are
not likely to converge to lots of decimal digits,
but you should go far enough that you know what
the size of the "noise" in the results that seems
to be independent of resolution).
Then repeat that convergence with a different number
of cores and see how much the converged results
differ. That's the critical piece of information we need.
I'll be very interested to learn what you find.
Of course, there is also the real possibility that I just have
a bug in block_dc_mt_dble. To test that, please try using
a new alternative that also does multicore block tridiagonal
small_mtx_decsol = 'block_trisolve_dble'
I would expect it to show the same small variations when
you change number of cores -- does it? Does it give
answers similar to the other solvers?
On Jun 25, 2012, at 7:08 AM, Jérôme Quintin wrote:
> Thanks for finding this out Theodore because I am changing the number of cores quite often on the cluster. I will try to do the test on my side as well today (I have also access to nodes with 24 cores on the new cluster). I will keep you in touch with my results.
> From: Theodore Arthur Sande [tasande at mit.edu]
> Sent: June 24, 2012 10:41 PM
> To: Jérôme Quintin
> Subject: Fwd: Re:
> Were working on this ...
> ----- Forwarded message from paxton at kitp.ucsb.edu -----
> Date: Sun, 24 Jun 2012 10:05:48 -0700
> From: Bill Paxton <paxton at kitp.ucsb.edu>
> Reply-To: Bill Paxton <paxton at kitp.ucsb.edu>
> Subject: Re:
> To: Theodore Arthur Sande <tasande at mit.edu>
> Thanks for checking this! It might well be a bug.
> Please check a few things for me. Are you using these defaults?
> hydro_decsol_switch = 15
> ! if current nvar <= switch, (recall nvar = nvar_hydro + species)
> ! then use small_mtx_decsol for current step,
> ! else use large_mtx_decsol.
> small_mtx_decsol = 'block_dc_mt_dble'
> large_mtx_decsol = 'block_dc_mt_klu'
> If so, please try this alternative and check if the problem goes away.
> hydro_decsol_switch = 150
> small_mtx_decsol = 'lapack'
> The lapack algorithm is unchanged by the number of cores
> (ie. it doesn't benefit from multicore parallelism),
> whereas the "block_dc_mt" algorithms depend strongly on
> the number of cores to give pretty good parallelism.
> The block parallel algorithms will factor the large linear
> algebra problem differently depending on the number of cores.
> Because of ill-conditioning in the matrices, that could give
> different answers for difference factoring. That would be sad.
> Keep me informed.
> On Jun 24, 2012, at 12:32 AM, Theodore Arthur Sande wrote:
>> Hello Bill,
>> I have a question that I thought I would send to you directly. I have started
>> running mesa on a supercomputer up on Quebec through the kind offer of Lorne
>> Nelson. Today, as an experiment, I decided to test execution times using from
>> 1-8 cores on a single CPU. To my my surprise, the star.log files are slightly
>> different, e.g., ~0.1%
>> ( I had been previously using just 1 core as I learned how to use the
>> I assume that this should not be occurring. I will test this out on another
>> computer presently.
>> One of Lorne's students will be helping me to investigate this. I am using
>> 4028 and intel64/188.8.131.523. I specify core # by a first line in rn, e.g.,
>> export OMP_NUM_THREADS=4.
>> I may be missing something obvious, but I thought I would share this with you.
>> If this is easy to rectify, I will let you know what Lorne's student, Jerome
>> Quintin, has to say.
>> Thanks again,
>> Theodore Arthur Sande
>> MIT Department of Physics
>> tasande at mit.edu
> ----- End forwarded message -----
> Theodore Arthur Sande
> MIT Department of Physics
> tasande at mit.edu
More information about the Mesa-users