[mesa-users] Convergence

Bill Paxton paxton at kitp.ucsb.edu
Fri May 20 14:04:39 EDT 2016


This is a good teaching opportunity. ;)   let's look into the use of the 'report_hydro_solver_progress' control that Rob mentioned.

This example is from the 1M_pre_ms_to_wd test case when it has reached the he core flash at the tip of the RGB.  the timestep has to be drastically cut down to get through the flash, but the standard time step controls haven't done the job.  the result is a short burst of retries and backups to reduce the timestep the hard way.   

let's use report_hydro_solver_progress to look at details for the 1st backup.  for this example, i've modified the controls to make the newton do more iterations before giving up.   look in controls.defaults if you need a reminder of what these controls do.

      newton_iterations_limit = 18 ! this is used for setting timesteps
      max_tries = 19
      iter_for_resid_tol2 = 18

Here's the terminal output for the newton failure at step 2002: (best viewed with fixed pitch font such as Courier so that the columns line up)

  2002    1  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  7.848E-04  max resid  1.680E+01  avg corr  1.147E-02  max excess  1.402E+02  lg dt/yr  2.51  avg+max corr+resid
  2002    2  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  4.176E-04  max resid  6.278E+00  avg corr  6.388E-03  max excess  4.192E+01  lg dt/yr  2.51  avg+max corr+resid
  2002    3  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  2.049E-01  max resid  6.015E+03  avg corr  5.934E-05  max excess  2.805E+00  lg dt/yr  2.51  avg+max corr+resid
  2002    4  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  1.143E-01  max resid  1.753E+03  avg corr  1.628E-03  max excess  9.842E+00  lg dt/yr  2.51  avg+max corr+resid
  2002    5  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  4.516E-02  max resid  7.629E+02  avg corr  7.307E-05  max excess  4.015E+00  lg dt/yr  2.51  avg+max corr+resid
  2002    6  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  1.417E-02  max resid  2.227E+02  avg corr  2.194E-04  max excess  1.288E+00  lg dt/yr  2.51  avg+max corr+resid
  2002    7  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  4.096E-03  max resid  5.859E+01  avg corr  2.998E-04  max excess  1.803E+00  lg dt/yr  2.51  avg+max corr+resid
  2002    8  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  1.387E-03  max resid  2.024E+01  avg corr  8.025E-05  max excess  9.389E-01  lg dt/yr  2.51  avg corr, avg+max resid
  2002    9  coeff  1.0000  slope -6.254E+02  f  5.238E+01  avg resid  3.802E-04  max resid  1.021E+01  avg corr  5.443E-05  max excess  4.255E-01  lg dt/yr  2.51  avg corr, avg+max resid
  2002   10  coeff  0.9621  slope  0.000E+00  f  5.995E+02  avg resid  1.278E-03  max resid  3.448E+01  avg corr  8.712E-05  max excess  1.691E+00  lg dt/yr  2.51  avg+max corr+resid
  2002   11  coeff  1.0000  slope -8.572E+01  f  5.978E+00  avg resid  2.268E-04  max resid  2.692E+00  avg corr  1.091E-04  max excess  1.154E+00  lg dt/yr  2.51  avg+max corr+resid
  2002   12  coeff  0.2192  slope -2.794E+02  f  5.321E+00  avg resid  2.014E-04  max resid  2.760E+00  avg corr  2.220E-05  max excess  1.007E+00  lg dt/yr  2.51  avg+max corr+resid
  2002   13  coeff  0.1000  slope -2.290E+02  f  1.077E+01  avg resid  2.637E-04  max resid  4.175E+00  avg corr  2.230E-04  max excess  2.738E+00  lg dt/yr  2.51  avg+max corr+resid
  2002   14  coeff  0.1000  slope -2.644E+02  f  1.092E+01  avg resid  2.782E-04  max resid  4.037E+00  avg corr  1.942E-04  max excess  2.411E+00  lg dt/yr  2.51  avg+max corr+resid
  2002   15  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  5.678E-04  max resid  9.860E+00  avg corr  8.524E-05  max excess  9.208E-01  lg dt/yr  2.51  avg corr, avg+max resid
  2002   16  coeff  0.2000  slope -4.362E+01  f  4.013E+01  avg resid  4.675E-04  max resid  8.253E+00  avg corr  7.040E-05  max excess  8.586E-01  lg dt/yr  2.51  avg corr, avg+max resid
  2002   17  coeff  0.2000  slope -3.361E+01  f  3.153E+01  avg resid  4.028E-04  max resid  7.418E+00  avg corr  7.252E-05  max excess  1.050E+00  lg dt/yr  2.51  avg+max corr+resid
  2002   18  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  2.504E+00  max resid  7.689E+04  avg corr  2.254E-04  max excess  4.758E+00  lg dt/yr  2.51  avg+max corr
  2002   19  coeff  0.0309  slope  0.000E+00  f  2.774E+09  avg resid  2.427E+00  max resid  7.449E+04  avg corr  4.662E-03  max excess  7.777E+01  lg dt/yr  2.51  avg+max corr -- give up
 hydro_newton_step failed to converge

First thing to notice is that I said "failed to converge" in the last line -- please understand the use of "convergence" in the context of the newton solver means finding an acceptable new model.

The step number is in column 1, the iteration number in column 2.  We've set max_tries = 19, so it gives up after that many iterations.

The last column text indicates why the iteration was rejected.  Since I've set iter_for_resid_tol2 = 18, it is considering residuals up to iteration 17 and finding them too large.

The avg and max corrections are too large for all of the iterations.  In fact they've stopped improving after about 3 iterations.

btw: the "max excess" is equal to the max correction divided by the tol_max_correction, so anything > 1 is bad.  (the current public verion of mesa shows max_corr in this column).

Looking at the values for avg and max, for residuals and corrections, you can see that the newton iterations are not leading to improvements -- things are actually getting worse.

This is because we are trying to take too large a timestep, and the assumption of near linear response of residuals to corrections is invalid.

More iterations won't help; the only solution is to retry with a smaller timestep. (drops log dt/yr from 2.5 to 2.2)

When that timestep reduction happens, we see this output from report_hydro_solver_progress for the same starting model:

  2002    1  coeff  1.0000  slope -8.472E+01  f  1.282E+00  avg resid  7.291E-05  max resid  1.561E+00  avg corr  5.959E-03  max excess  8.593E+01  lg dt/yr  2.20  avg+max corr+resid
  2002    2  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  2.364E-06  max resid  4.573E-02  avg corr  3.225E-03  max excess  2.139E+01  lg dt/yr  2.20  avg+max corr+resid
  2002    3  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  1.969E-06  max resid  5.548E-02  avg corr  8.639E-05  max excess  5.873E-01  lg dt/yr  2.20  avg corr, avg+max resid
  2002    4  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  2.094E-09  max resid  4.294E-05  avg corr  1.125E-06  max excess  7.163E-03  lg dt/yr  2.20  avg+max resid
  2002    5  coeff  1.0000  slope  0.000E+00  f  0.000E+00  avg resid  1.349E-12  max resid  3.595E-09  avg corr  2.432E-08  max excess  1.553E-04  lg dt/yr  2.20  okay!

That's more like it!   That's the sort of "convergence" you want to see.  Note the rapid drop at each iteration in both residuals and corrections.

So why not always require the residuals to get small just as we require the corrections to get small?

The sad truth is that we run into cases where small corrections lead to large changes in residuals.  The accuracy of the partials going into the Jacobian is not good enough to let us find the exact correction that would make the residual get small.  The result can be "stalled" residuals that don't keep dropping.  The corrections may have gotten small enough to satisfy the tolerances, but even such small corrections are not able to get the small residuals we would like to see.  The ideal answer to this problem is to improve the partials!   trust me, we tried and are continuing to work on this.  but in the meantime, the standard solution (shared with other stellar codes) is to stop considering the residuals and settle for just getting small corrections even if that doesn't give you small residuals.

NOTE:  this means that we can and do end up accepting new models that can have substantial residuals, and that means the models have substantial deviations from providing "correct" solutions to the stellar equations.  gasp!  that's horrible!  how can any of this work?  good question.  what's your answer?

-Bill









On May 20, 2016, at 9:55 AM, Robert Farmer wrote:

> > Also is there a way to figure out after the runs have completed to determine how well the convergence was in reality?
> 
> If you haven't already found it, there is this option in controls:
> 
> report_hydro_solver_progress = .true.
> 
> Which will tell give you information about the newton iterations as the run progresses and how good the acceptance was.
> Rob
> 
> On Fri, May 20, 2016 at 9:24 AM, Bill Paxton <paxton at kitp.ucsb.edu> wrote:
> 
> On May 20, 2016, at 9:05 AM, Kenny Van wrote:
> 
>> One question I had was that does tol_max_correction guarantee that conversion is always no worse than that? Also is there a way to figure out after the runs have completed to determine how well the convergence was in reality?
> 
> Hi,
> 
> First we need to be careful about the meaning of "convergence"
> 
> 1) we speak of the newton iterations converging to a solution for the new model at the end of a timestep
> 
> and
> 
> 2) we also use convergence to mean the final results of a run converging to (roughly) the same values as the tolerances are tightened forcing more timesteps and more zones.  
> 
> the 1st kind is done at each timestep, the 2nd is done before publishing (or sooner!) by expert users to check the chance that their results are numerical artifacts of inadequate time or space resolution.   there is of course the danger that by reducing timesteps, you'll open up new physics that you would rather stayed hidden (surface pulsations for example).   so this process cannot be pushed too far, but it also should not be neglected.
> 
> 
> for newton iterations, i try to avoid the term "convergence" and talk about "acceptance" instead.  the newton generates a series of trial solutions.  each is checked for how well it satisfies the equations ("residuals") and how small the difference is from the previous trial sotution ("corrections").
> 
> acceptance must happen within a specified number of trials, or the effort is stopped and the system must do a retry with a smaller timestep.
> 
> both the residuals and the corrections can be considered in deciding whether or not to accept a trial solution.
> 
> in many cases, it is desirable to stop checking residuals after a specified number of iterations and just use corrections to decide acceptance.
> 
> these options are given in controls.defaults, so that's where you need to go next.  search for "solver controls" and read on.
> 
> -Bill
> 
> 
> 
> 
> 
> 
> ------------------------------------------------------------------------------
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
> _______________________________________________
> mesa-users mailing list
> mesa-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa-users
> 
> 
> ------------------------------------------------------------------------------
> Mobile security can be enabling, not merely restricting. Employees who
> bring their own devices (BYOD) to work are irked by the imposition of MDM
> restrictions. Mobile Device Manager Plus allows you to control only the
> apps on BYO-devices by containerizing them, leaving personal data untouched!
> https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
> mesa-users mailing list
> mesa-users at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/mesa-users

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mesastar.org/pipermail/mesa-users/attachments/20160520/d8aa6aad/attachment.html>


More information about the Mesa-users mailing list