[Mesa-users] Pointer error with free_star
Warrick Ball
wball at bison.ph.bham.ac.uk
Tue Dec 19 09:41:45 EST 2017
Hi Rob,
Wow, thanks! I'm phenomenally grateful that someone who knows what
they're doing looked at this. I'll make my way through the details when I
next have a chance and try it out in our application.
At first glance though, should I/we expect the same thing to happen with
s% chem_id? Like s% net_iso, it also appears in both
star/private/alloc.f90:free_hydro and net/public/net_def.f90:do_free_net?
I guess I'll find out when I try commenting the deallocate.
Thanks again!
Warrick
------------
Warrick Ball
Postdoc, School of Physics and Astronomy
University of Birmingham, Edgbaston, Birmingham B15 2TT
wball at bison.ph.bham.ac.uk
+44 (0)121 414 4552
On Tue, 19 Dec 2017, r.j.farmer at uva.nl wrote:
> Hi
> So i *think* i know whats going on.
>
> If we run your test case (nice to see a self contained problem!), compiled
> with -ggdb flags and ran under valgrind (valgrind ./test_free_star) we get:
>
> ==22304== Invalid free() / delete / delete[] / realloc()
> ==22304== at 0x4C2EDAA: free (vg_replace_malloc.c:530)
> ==22304== by 0x8FAE49: __net_def_MOD_do_free_net (net_def.f90:386)
> ==22304== by 0x4D739B: __net_MOD_set_net (net.f90:639)
> ==22304== by 0x60D4BF: __init_MOD_model_builder (init.f90:1191)
> ==22304== by 0x60DD84: __init_MOD_create_pre_ms_model (init.f90:1014)
> ==22304== by 0x42F448: __star_lib_MOD_star_create_pre_ms_model
> (star_lib.f90:378)
> ==22304== by 0x42B7D1: MAIN__ (test_free_star.f90:39)
> ==22304== by 0x42B8E3: main (test_free_star.f90:2)
> ==22304== Address 0xa9e3a70 is 0 bytes inside a block of size 31,424 free'd
> ==22304== at 0x4C2EDAA: free (vg_replace_malloc.c:530)
> ==22304== by 0x4A14B7: __alloc_MOD_free_hydro (alloc.f90:236)
> ==22304== by 0x4A2E0B: __alloc_MOD_free_arrays (alloc.f90:224)
> ==22304== by 0x4A2E51: __alloc_MOD_free_star_data (alloc.f90:210)
> ==22304== by 0x42B880: MAIN__ (test_free_star.f90:47)
> ==22304== by 0x42B8E3: main (test_free_star.f90:2)
> ==22304== Block was alloc'd at
> ==22304== at 0x4C2DBFD: malloc (vg_replace_malloc.c:299)
> ==22304== by 0x9074B6: __net_initialize_MOD_start_net_def
> (net_initialize.f90:1313)
> ==22304== by 0x4D7322: __net_MOD_set_net (net.f90:729)
> ==22304== by 0x60D4BF: __init_MOD_model_builder (init.f90:1191)
> ==22304== by 0x60DD84: __init_MOD_create_pre_ms_model (init.f90:1014)
> ==22304== by 0x42F448: __star_lib_MOD_star_create_pre_ms_model
> (star_lib.f90:378)
> ==22304== by 0x42B7D1: MAIN__ (test_free_star.f90:39)
> ==22304== by 0x42B8E3: main (test_free_star.f90:2)
>
>
> which is telling us we have two calls to free on the net_iso array, one in
> star/private/alloc.f90 line 236 and once in net/public/net_def.f90 line 386
> (where you found the error). If we look at the one in alloc.f90 we have:
>
> if (associated(s% net_iso)) then
> deallocate(s% net_iso)
> nullify(s% net_iso)
> end if
>
> where the s%net_iso points to the g%net_iso (somewhere) so when we
> deallocate(s%net_iso) we are also effectively deallocate(g%net_iso) though
> this leaves g%net_iso associated (which is why the if check in the net code
> doesn't stop this from happening)
>
> So for now the temporary fix seems to be to comment out the deallocate
> (leave the nullify statements) call in the alloc.f90 file for the s%net_iso
> and also for s%chem_id (function free_hydro, line 232 and line 236 for
> v10108). Though i'm not sure if this will have other repercussions yet.
> I've ran 10 loops of your code and it seems to work
>
>> I also noticed that with each new model, MESA would consume another ~0.6%
> of my 16 GB of RAM, which is about 100MB
>
> I see the same issue with 10108, i guess we are leaking memory somewhere.
>
> Rob
>
>
> On 15 December 2017 at 14:52, Warrick Ball via Mesa-users <
> mesa-users at lists.mesastar.org> wrote:
>
>> Hi everyone,
>>
>> I'm collaborating on an application that will use MESA as a library to
>> evaluate
>> a *lot* of stellar models. I'm not sure exactly how many but the only
>> relevant
>> point here is that I need to release star pointers.
>>
>> We found that we couldn't make more than ten models and traced that back
>> to a
>> hardcoded maximum number of pointers on line 110 of
>> `$MESA_DIR/star/public/star_def.f90` (r10108).
>>
>> integer, parameter :: max_star_handles = 10 ! this can be increased
>> as
>> necessary
>>
>> Granted, one option is to increase this number to something more suitable
>> but
>> we are calling `free_star` from `star_lib` after each evolutionary run,
>> which
>> should be releasing star handles. I also noticed that with each new model,
>> MESA would consume another ~0.6% of my 16 GB of RAM, which is about 100MB
>> (though our code uses r9793 so newer versions might behave better).
>>
>> So I started following function calls and noticed this in `star_lib` (lines
>> 65--75):
>>
>> subroutine free_star(id, ierr)
>> use alloc, only: free_star_data
>> ! frees the handle and all associated data
>> integer, intent(in) :: id
>> integer, intent(out) :: ierr
>>
>> ierr = 0
>> return ! skip this for now
>>
>> call free_star_data(id, ierr)
>> end subroutine free_star
>>
>> Note the line that says "skip this for now"! The subroutine doesn't
>> actually
>> release the pointer. Naturally, I tried commenting that line and running
>> our
>> code again but ran into an error that I haven't been able to debug.
>>
>> I've written a small program (attached, with Makefile) that demonstrates
>> the
>> problem. It allocates a new star pointer, creates a pre-MS model, then
>> releases the pointer, twice. If you comment out the early return above,
>> the
>> second iteration fails with the error
>>
>>
>> *** Error in `./test_free_star': double free or corruption (!prev):
>> 0x000000000684f760 ***
>>
>> Program received signal SIGABRT: Process abort signal.
>>
>> Backtrace for this error:
>> #0 0x7f1d3786471f in ???
>> #1 0x7f1d3786469b in ???
>> #2 0x7f1d378663b0 in ???
>> #3 0x7f1d378aea86 in ???
>> #4 0x7f1d378b5e8d in ???
>> #5 0x7f1d378b7988 in ???
>> #6 0x7f1d378c02ed in ???
>> #7 0x7968b1 in __net_def_MOD_do_free_net
>> at ../public/net_def.f90:384
>> #8 0x49710b in __net_MOD_set_net
>> at ../private/net.f90:639
>> #9 0x560b9f in model_builder
>> at ../private/init.f90:1191
>> #10 0x561434 in __init_MOD_create_pre_ms_model
>> at ../private/init.f90:1014
>> #11 0x418448 in __star_lib_MOD_star_create_pre_ms_model
>> at ../public/star_lib.f90:378
>> #12 0x414971 in ???
>> #13 0x414a83 in ???
>> #14 0x7f1d3784e039 in ???
>> #15 0x414489 in ???
>> #16 0xffffffffffffffff in ???
>> Aborted (core dumped)
>>
>>
>> I've looked at the listed functions but can't see that there's a problem.
>> The
>> error crops up at the start of the pre-MS subroutine, where it sets the
>> nuclear
>> reaction network (`set_net`). That checks if there's currently a net and,
>> if
>> not, releases the existing one (`do_free_net`). But that then crashes,
>> apparently, as far as I can tell, because of g% net_iso. From
>> `$MESA_DIR/net/public/net_def.f90`, lines 383--386:
>>
>> if (associated(g% net_iso)) then
>> deallocate(g% net_iso)
>> nullify(g% net_iso)
>> end if
>>
>> I think these are the guilty lines because I sprinkled `write` statements
>> everywhere and then tried moving this up and down the sequence of
>> `deallocate`
>> statements.
>>
>> The only guidance I can find online is that something appears to be
>> corrupting
>> the array structure, possibly by going out of bounds but I have no idea
>> how to
>> track this down in the code. One option appears to be to call `deallocate`
>> with an error integer, in which case the code won't crash and I can maybe
>> just
>> ignore the error. But I'm a bit far beyond the limits of my
>> knowledge/debugging skills and would appreciate any advice anyone can
>> offer.
>>
>> Thanks,
>> Warrick
>>
>> PS: Hoping the embarassingly-subjectless version of this message is
>> swallowed in the ether...
>>
>>
>> ------------
>> Warrick Ball
>> Postdoc, School of Physics and Astronomy
>> University of Birmingham, Edgbaston, Birmingham B15 2TT
>> wball at bison.ph.bham.ac.uk
>> +44 (0)121 414 4552
>
More information about the Mesa-users
mailing list