[Mesa-users] Experience with Docker Container and r 11554
Ian Foley
ifoley2008 at gmail.com
Sun Apr 14 17:41:28 EDT 2019
Hi,
I thought it might be valuable for those users using MESA on limited memory
computers for those using r 11554. My computer has 8GB RAM and I am also
using the docker container on Windows 10 Professional. This release of MESA
adds additional EOS data files and these can test available memory to the
limit and can be turned off if necessary. But I wanted to use them if
possible.
This release of MESA adds the inlist "num_steps_for_garbage_collection"
which defaults to 1000 and is useful to remove EOS data which is no longer
needed and taking up too much memory. The problem is that it can happen
that the added EOS data files can total near to 1 GB within 100 steps. If
we set garbage collection to occur every 100 steps it makes a significant
performance hit as the re-allocation of large EOS data files takes time. It
is also true that much of the evolution does not require a big jump in EOS
data.
Much better would be to track free memory and activate garbage collection
when it gets to < 1GB. The code below which can be added to run_star_extras
achieves this and I have found it very useful.
! For version 10398 and onwards
! For version 11554 onwards there is an inlist to do garbage
collection. Default 1000 models.
! However, this is fixed setting and will often lead to extra cost
when doing the garbage
! collection. So now we track memory and only do garbage
collection when memory is less than
! a certain minimum. This routine will free all the eos memory
then re-initialize it,
! when mesa takes the next step it will reload only the eos data
mesa needs then.
! Garbage collection code courtesy of Rob Farmer, 18 April 2018
if (mod(s% model_number,100)==0) then
write(*,*) 'Process id ',getpid()
write(*,*) 'Output from Linux free command'
call execute_command_line('free >memory.txt')
call execute_command_line('free')
open(100,file='memory.txt',status='old',iostat=ierr)
read(100,*) string3
read(100,'(a8,6i12)') string1,int1, int2, int3, int4, int5, int6
read(100,'(a7,3i12)') string2,int7, int8, int9
close(100)
free = int3 + int9
write(*,*) 'Model ', s% model_number
write(*,*) 'Total free memory = ',free
if (free < 1000000) then
write(*,*) 'Do garbage collection'
call eos_shutdown()
call eos_init(s% job% eos_file_prefix,s% job% eosDT_cache_dir,&
s% job% eosPT_cache_dir, &
s% job% eosDE_cache_dir, .true.,ierr)
call execute_command_line('free')
endif
endif
Since there is no fortran command (or other language command) to find free
memory what we do above is to execute the Linux free command every 100
steps redirecting its output to a text file. Then we open the text file and
parse its content to retrieve the amount of free memory. If its <1GB we
activate garbage collection (using code originally from Rob Farmer -
thanks).
I hope this will be useful to some
Kind regards
Ian
On Tue, 19 Mar 2019 at 09:21, Ian Foley <ifoley2008 at gmail.com> wrote:
> Thanks for your advice.
>
> Ian
>
> On Tue, 19 Mar 2019 at 8:01 am, Evan Bauer <ebauer at physics.ucsb.edu>
> wrote:
>
>> Hi Ian,
>>
>> Increasing the frequency of garbage collection sounds like a good idea to
>> me, especially if your star is evolving through new EOS regions quickly.
>> There really isn’t much downside to this other than a small speed hit.
>>
>> If you’re very memory constrained and want to go back to the old way of
>> doing things, you also have the option of turning off the new EOS tables
>> with
>> use_eosDT2 = .false.
>> use_eosELM = .false.
>>
>> Cheers,
>> Evan
>>
>>
>> On Mar 18, 2019, at 10:46 AM, Ian Foley <ifoley2008 at gmail.com> wrote:
>>
>> Thanks Rob for the detailed explanation. I will follow your suggestion
>> and check for memory leaks. btw I'm using Windows 10 Professional.
>>
>> I may also have to increase the frequency of garbage collection to avoid
>> a crash. 2GB is a lot more memory to need in 400 models when we are a long
>> way into the evolution and I have a limit of 8GB of real memory.
>>
>> Kind regards
>> ian
>>
>>
>> On Mon, 18 Mar 2019 at 20:49, Rob Farmer <r.j.farmer at uva.nl> wrote:
>>
>>> Hi,
>>> >
>>> Num EOS files loaded 13000 7 0 17
>>> 12 17
>>> Num EOS files loaded 13001 0 0 10
>>> 4 17
>>>
>>> The ordering of the numbers is in line 410 in
>>> star/job/run_star_support.f90,
>>>
>>> write(*,*) "Num EOS files loaded", s%model_number, num_DT, num_PT, &
>>> num_DT2, num_PTEH, num_ELM
>>>
>>> So its telling you how many of each type of eos is currently loaded into
>>> memory. Then by comparing the before and after the garbage collection we
>>> can see whether we removed any eos files.
>>>
>>> So in you case we removed 7 eosDT files, 7 eosDT2 files, 8 PTEH files
>>> and no ELM or PT files. This is only meant as a diagnostic but does show
>>> that in this case you removed ~40% of the loaded eos files which should be
>>> a good memory saving.
>>>
>>> >What amazed me was that between model 12460 and 12810 MESA has needed
>>> nearly 2 MB of memory! which it has had to grab from the swap space leaving
>>> less than 1MB available. That seems a huge amount over a short evolution
>>> period. (1475904 to 3373664)
>>>
>>> I assume you meant GB here? What is likely happening is your model is
>>> entering a new region of parameter space so we need to load in more eos
>>> data files.
>>>
>>> But to check that its not a memory leak, run the model once up to some
>>> model number and record the ~memory used at the end. Then do a restart from
>>> say a 1000 steps before the end and record its memory usage at the end. If
>>> there ~same then that is just normal mesa memory usage for this problem. If
>>> the first run uses alot more memory then we have leaked memory somewhere.
>>>
>>> Also are you using the windows home (or pro?) docker container? If home,
>>> you can configure the memory it uses, if you look in the
>>> win_home_dockerMESA.sh file at the docker-machine create line you can
>>> configure the memory it has with --virtualbox-memory=2048 (in mb). You may
>>> need to delete the old virtual machine first with the
>>> utils/uninstall_win_home.sh script if you change the memory value.
>>>
>>> Rob
>>>
>>>
>>> On Mon, 18 Mar 2019 at 04:31, Ian Foley via Mesa-users <
>>> mesa-users at lists.mesastar.org> wrote:
>>>
>>>> Hi Evan,
>>>>
>>>> Thanks for setting up r11554 in the MESA-Docker container. I have
>>>> deleted older versions as you suggested. Everything seems to be working
>>>> well except in an inlist for a 1M model evolution it crashed in a way like
>>>> running out of memory near model 13000. I've attached files I think
>>>> sufficient for you to reproduce the effect.
>>>>
>>>> Memory is cleaned up at model 12,000 and 13,000 because of the
>>>> following setting. I might have been able to prevent the crash by
>>>> decreasing this setting.
>>>> num_steps_for_garbage_collection = 1000
>>>> report_garbage_collection = .true.
>>>> After the crash, I restarted the run at model 12,000 and since I
>>>> modified "re" and "rn" to run star in the background, I can monitor the
>>>> memory with "free". I entered the model number in the terminal so I can
>>>> record when I executed "free".
>>>>
>>>> What amazed me was that between model 12460 and 12810 MESA has needed
>>>> nearly 2 MB of memory! which it has had to grab from the swap space leaving
>>>> less than 1MB available. That seems a huge amount over a short evolution
>>>> period. (1475904 to 3373664)
>>>>
>>>> This is the report garbage collection output at model 13000. I haven't
>>>> yet gone to the source code to find out what the number mean.
>>>>
>>>> Num EOS files loaded 13000 7 0 17
>>>> 12 17
>>>> Num EOS files loaded 13001 0 0 10
>>>> 4 17
>>>>
>>>> Terminal output for run from model 12000 to 13010.
>>>>
>>>> docker at a9e770e1dc66:~/docker_work/1M$ 12450
>>>> -bash: 450: command not found
>>>> docker at a9e770e1dc66:~/docker_work/1M$ free
>>>> total used free shared buff/cache
>>>> available
>>>> Mem: 3056888 2885884 84456 0 86548
>>>> 30556
>>>> Swap: 4194300 1475904 2718396
>>>> docker at a9e770e1dc66:~/docker_work/1M$ 12460
>>>> -bash: 12460: command not found
>>>> docker at a9e770e1dc66:~/docker_work/1M$ 12810
>>>> -bash: 12810: command not found
>>>> docker at a9e770e1dc66:~/docker_work/1M$ free
>>>> total used free shared buff/cache
>>>> available
>>>> Mem: 3056888 2895980 76212 0 84696
>>>> 21444
>>>> Swap: 4194300 3373664 820636
>>>> docker at a9e770e1dc66:~/docker_work/1M$ 12900
>>>> -bash: 12900: command not found
>>>> docker at a9e770e1dc66:~/docker_work/1M$ free
>>>> total used free shared buff/cache
>>>> available
>>>> Mem: 3056888 2893880 69184 0 93824
>>>> 18968
>>>> Swap: 4194300 3348584 845716
>>>> docker at a9e770e1dc66:~/docker_work/1M$ 12990
>>>> -bash: 12990: command not found
>>>> docker at a9e770e1dc66:~/docker_work/1M$ free
>>>> total used free shared buff/cache
>>>> available
>>>> docker at a9e770e1dc66:~/docker_work/1M$ free
>>>> total used free shared buff/cache
>>>> available
>>>> Mem: 3056888 2883048 79752 0 94088
>>>> 29472
>>>> Swap: 4194300 3935380 258920
>>>> docker at a9e770e1dc66:~/docker_work/1M$ 13010
>>>> -bash: 13010: command not found
>>>> docker at a9e770e1dc66:~/docker_work/1M$ free
>>>> total used free shared buff/cache
>>>> available
>>>> Mem: 3056888 2905660 75560 0 75668
>>>> 16104
>>>> Swap: 4194300 2024256 2170044
>>>>
>>>> The use of such a large memory chunk in such a short number of models
>>>> is what is concerning me. Should I expect this with r11554 or is there some
>>>> bug?
>>>>
>>>> Attached files re2.txt is the redirected terminal output
>>>> The photo is for model 12,000 used for the restart on my Windows 10
>>>> Professional software environment.
>>>> I hope that is all you need.
>>>>
>>>> kind regards
>>>> Ian
>>>>
>>>>
>>>> On Sun, 17 Mar 2019 at 06:34, Evan Bauer <ebauer at physics.ucsb.edu>
>>>> wrote:
>>>>
>>>>> Hi Ian,
>>>>>
>>>>> 11554 should be ready to go if you just “git pull” in the MESA-docker
>>>>> repository to update. Let me know if that isn’t working for you. I
>>>>> definitely recommend the upgrade.
>>>>>
>>>>> While you’re at it, I’ll also remind you that it’s probably a good
>>>>> idea to clean up your older docker images to save hard drive space. You can
>>>>> remove the image of 11532 with this command:
>>>>> docker rmi evbauer/mesa_lean:11532.01
>>>>>
>>>>> You can also check what other older images might be sitting around
>>>>> (and how much space they’re using) with this command:
>>>>> docker images
>>>>>
>>>>> If you’re not regularly using the older MESA versions in those images,
>>>>> you should probably get rid of them too with the “docker rmi” command.
>>>>>
>>>>> Cheers,
>>>>> Evan
>>>>>
>>>>>
>>>>> _______________________________________________
>>>> mesa-users at lists.mesastar.org
>>>> https://lists.mesastar.org/mailman/listinfo/mesa-users
>>>>
>>>>
>> --
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mesastar.org/pipermail/mesa-users/attachments/20190415/187002b2/attachment.html>
More information about the Mesa-users
mailing list