[Mesa-users] Experience with Docker Container and r 11554
Ian Foley
ifoley2008 at gmail.com
Mon Apr 15 04:31:07 EDT 2019
Hi Rob. Thanks for your input. I will investigate using C-based syscall’s.
I have already switched to following Michael’s suggestion but using execute
command line to create the output as a text file. In the meantime, this
routine is working well and helping a lot. I have wondered about a generic
solution since I have no idea how universal "free" and "proc/meminfo" is to
other unix o/s.
Kind regards
Ian
On Mon, 15 Apr 2019 at 17:48, Rob Farmer <r.j.farmer at uva.nl> wrote:
> Hi Ian,
>
> Thanks for the suggestion, while i think about what is the best way for
> MESA to do this, you would probably want to consider Micheal's suggestion
> of reading /proc/meminfo over using execute_command_line(). The issue with
> execute_command_line is that it calls fork() which can lead to memory
> exhaustion, depending on how much you can overcommit your memory. See some
> of the previous mesa-user issues when we called the shell's mv and mkdir
> via execute_command_line. As such we stopped using execute_command_line()
> in MESA and switched to a c-based syscall interface to call mv and mkdir.
>
> Rob
>
> On Mon, 15 Apr 2019 at 00:45, Ian Foley via Mesa-users <
> mesa-users at lists.mesastar.org> wrote:
>
>> Thanks Michael!
>>
>> Ian
>>
>>
>> On Mon, 15 Apr 2019 at 08:21, Michael Ashley <m.ashley at unsw.edu.au>
>> wrote:
>>
>>> Hi Ian,
>>>
>>> Another way of obtaining the amount of free memory on a Linux system is
>>> to read the file /proc/meminfo.
>>>
>>> If you use the "free" command, note the "-t" switch which calculates the
>>> total for you.
>>>
>>> Regards, Michael
>>>
>>> On Mon, Apr 15, 2019 at 07:41:28AM +1000, Ian Foley via Mesa-users wrote:
>>> > Hi,
>>> >
>>> > I thought it might be valuable for those users using MESA on limited
>>> memory
>>> > computers for those using r 11554. My computer has 8GB RAM and I am
>>> also using
>>> > the docker container on Windows 10 Professional. This release of MESA
>>> adds
>>> > additional EOS data files and these can test available memory to the
>>> limit and
>>> > can be turned off if necessary. But I wanted to use them if possible.
>>> >
>>> > This release of MESA adds the inlist
>>> "num_steps_for_garbage_collection" which
>>> > defaults to 1000 and is useful to remove EOS data which is no longer
>>> needed and
>>> > taking up too much memory. The problem is that it can happen that the
>>> added EOS
>>> > data files can total near to 1 GB within 100 steps. If we set garbage
>>> > collection to occur every 100 steps it makes a significant performance
>>> hit as
>>> > the re-allocation of large EOS data files takes time. It is also true
>>> that much
>>> > of the evolution does not require a big jump in EOS data.
>>> >
>>> > Much better would be to track free memory and activate garbage
>>> collection when
>>> > it gets to < 1GB. The code below which can be added to run_star_extras
>>> achieves
>>> > this and I have found it very useful.
>>> >
>>> > ! For version 10398 and onwards
>>> > ! For version 11554 onwards there is an inlist to do garbage
>>> > collection. Default 1000 models.
>>> > ! However, this is fixed setting and will often lead to extra
>>> cost
>>> > when doing the garbage
>>> > ! collection. So now we track memory and only do garbage
>>> collection
>>> > when memory is less than
>>> > ! a certain minimum. This routine will free all the eos
>>> memory then
>>> > re-initialize it,
>>> > ! when mesa takes the next step it will reload only the eos
>>> data mesa
>>> > needs then.
>>> > ! Garbage collection code courtesy of Rob Farmer, 18 April
>>> 2018
>>> >
>>> > if (mod(s% model_number,100)==0) then
>>> > write(*,*) 'Process id ',getpid()
>>> > write(*,*) 'Output from Linux free command'
>>> > call execute_command_line('free >memory.txt')
>>> > call execute_command_line('free')
>>> > open(100,file='memory.txt',status='old',iostat=ierr)
>>> > read(100,*) string3
>>> > read(100,'(a8,6i12)') string1,int1, int2, int3, int4, int5,
>>> int6
>>> > read(100,'(a7,3i12)') string2,int7, int8, int9
>>> > close(100)
>>> > free = int3 + int9
>>> > write(*,*) 'Model ', s% model_number
>>> > write(*,*) 'Total free memory = ',free
>>> > if (free < 1000000) then
>>> > write(*,*) 'Do garbage collection'
>>> > call eos_shutdown()
>>> > call eos_init(s% job% eos_file_prefix,s% job%
>>> eosDT_cache_dir,&
>>> > s% job% eosPT_cache_dir, &
>>> > s% job% eosDE_cache_dir, .true.,ierr)
>>> > call execute_command_line('free')
>>> > endif
>>> > endif
>>> >
>>> > Since there is no fortran command (or other language command) to find
>>> free
>>> > memory what we do above is to execute the Linux free command every 100
>>> steps
>>> > redirecting its output to a text file. Then we open the text file and
>>> parse its
>>> > content to retrieve the amount of free memory. If its <1GB we activate
>>> garbage
>>> > collection (using code originally from Rob Farmer - thanks).
>>> >
>>> > I hope this will be useful to some
>>> >
>>> > Kind regards
>>> > Ian
>>> >
>>> >
>>> > [uc]
>>> >
>>> >
>>> > On Tue, 19 Mar 2019 at 09:21, Ian Foley <ifoley2008 at gmail.com> wrote:
>>> >
>>> > Thanks for your advice.
>>> >
>>> > Ian
>>> >
>>> > On Tue, 19 Mar 2019 at 8:01 am, Evan Bauer <
>>> ebauer at physics.ucsb.edu> wrote:
>>> >
>>> > Hi Ian,
>>> >
>>> > Increasing the frequency of garbage collection sounds like a
>>> good idea
>>> > to me, especially if your star is evolving through new EOS
>>> regions
>>> > quickly. There really isn’t much downside to this other than a
>>> small
>>> > speed hit.
>>> >
>>> > If you’re very memory constrained and want to go back to the
>>> old way of
>>> > doing things, you also have the option of turning off the new
>>> EOS
>>> > tables with
>>> > use_eosDT2 = .false.
>>> > use_eosELM = .false.
>>> >
>>> > Cheers,
>>> > Evan
>>> >
>>> >
>>> >
>>> > On Mar 18, 2019, at 10:46 AM, Ian Foley <
>>> ifoley2008 at gmail.com>
>>> > wrote:
>>> >
>>> > Thanks Rob for the detailed explanation. I will follow your
>>> > suggestion and check for memory leaks. btw I'm using
>>> Windows 10
>>> > Professional.
>>> >
>>> > I may also have to increase the frequency of garbage
>>> collection to
>>> > avoid a crash. 2GB is a lot more memory to need in 400
>>> models when
>>> > we are a long way into the evolution and I have a limit of
>>> 8GB of
>>> > real memory.
>>> >
>>> > Kind regards
>>> > ian
>>> > [uc]
>>> >
>>> >
>>> > On Mon, 18 Mar 2019 at 20:49, Rob Farmer <
>>> r.j.farmer at uva.nl> wrote:
>>> >
>>> > Hi,
>>> > >
>>> > Num EOS files loaded 13000 7
>>> 0
>>> > 17 12 17
>>> > Num EOS files loaded 13001 0
>>> 0
>>> > 10 4 17
>>> >
>>> > The ordering of the numbers is in line 410 in star/job/
>>> > run_star_support.f90,
>>> >
>>> > write(*,*) "Num EOS files loaded", s%model_number,
>>> num_DT,
>>> > num_PT, &
>>> > num_DT2, num_PTEH,
>>> num_ELM
>>> >
>>> > So its telling you how many of each type of eos is
>>> currently
>>> > loaded into memory. Then by comparing the before and
>>> after the
>>> > garbage collection we can see whether we removed any
>>> eos files.
>>> >
>>> > So in you case we removed 7 eosDT files, 7 eosDT2
>>> files, 8 PTEH
>>> > files and no ELM or PT files. This is only meant as a
>>> > diagnostic but does show that in this case you removed
>>> ~40% of
>>> > the loaded eos files which should be a good memory
>>> saving.
>>> >
>>> > >What amazed me was that between model 12460 and 12810
>>> MESA has
>>> > needed nearly 2 MB of memory! which it has had to grab
>>> from the
>>> > swap space leaving less than 1MB available. That seems
>>> a huge
>>> > amount over a short evolution period. (1475904 to
>>> 3373664)
>>> >
>>> > I assume you meant GB here? What is likely happening
>>> is your
>>> > model is entering a new region of parameter space so
>>> we need to
>>> > load in more eos data files.
>>> >
>>> > But to check that its not a memory leak, run the model
>>> once up
>>> > to some model number and record the ~memory used at
>>> the end.
>>> > Then do a restart from say a 1000 steps before the end
>>> and
>>> > record its memory usage at the end. If there ~same
>>> then that is
>>> > just normal mesa memory usage for this problem. If the
>>> first
>>> > run uses alot more memory then we have leaked memory
>>> somewhere.
>>> >
>>> > Also are you using the windows home (or pro?) docker
>>> container?
>>> > If home, you can configure the memory it uses, if you
>>> look in
>>> > the win_home_dockerMESA.sh file at the docker-machine
>>> create
>>> > line you can configure the memory it has with
>>> > --virtualbox-memory=2048 (in mb). You may need to
>>> delete the
>>> > old virtual machine first with the
>>> utils/uninstall_win_home.sh
>>> > script if you change the memory value.
>>> >
>>> > Rob
>>> >
>>> >
>>> > On Mon, 18 Mar 2019 at 04:31, Ian Foley via Mesa-users
>>> <
>>> > mesa-users at lists.mesastar.org> wrote:
>>> >
>>> > Hi Evan,
>>> >
>>> > Thanks for setting up r11554 in the MESA-Docker
>>> container.
>>> > I have deleted older versions as you suggested.
>>> Everything
>>> > seems to be working well except in an inlist for a
>>> 1M model
>>> > evolution it crashed in a way like running out of
>>> memory
>>> > near model 13000. I've attached files I think
>>> sufficient
>>> > for you to reproduce the effect.
>>> >
>>> > Memory is cleaned up at model 12,000 and 13,000
>>> because of
>>> > the following setting. I might have been able to
>>> prevent
>>> > the crash by decreasing this setting.
>>> > num_steps_for_garbage_collection = 1000
>>> > report_garbage_collection = .true.
>>> > After the crash, I restarted the run at model
>>> 12,000 and
>>> > since I modified "re" and "rn" to run star in the
>>> > background, I can monitor the memory with "free".
>>> I entered
>>> > the model number in the terminal so I can record
>>> when I
>>> > executed "free".
>>> >
>>> > What amazed me was that between model 12460 and
>>> 12810 MESA
>>> > has needed nearly 2 MB of memory! which it has had
>>> to grab
>>> > from the swap space leaving less than 1MB
>>> available. That
>>> > seems a huge amount over a short evolution period.
>>> (1475904
>>> > to 3373664)
>>> >
>>> > This is the report garbage collection output at
>>> model
>>> > 13000. I haven't yet gone to the source code to
>>> find out
>>> > what the number mean.
>>> >
>>> > Num EOS files loaded 13000 7
>>> 0
>>> > 17 12 17
>>> > Num EOS files loaded 13001 0
>>> 0
>>> > 10 4 17
>>> >
>>> > Terminal output for run from model 12000 to 13010.
>>> >
>>> > docker at a9e770e1dc66:~/docker_work/1M$ 12450
>>> > -bash: 450: command not found
>>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>>> > total used free
>>> shared
>>> > buff/cache available
>>> > Mem: 3056888 2885884 84456
>>> 0
>>> > 86548 30556
>>> > Swap: 4194300 1475904 2718396
>>> > docker at a9e770e1dc66:~/docker_work/1M$ 12460
>>> > -bash: 12460: command not found
>>> > docker at a9e770e1dc66:~/docker_work/1M$ 12810
>>> > -bash: 12810: command not found
>>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>>> > total used free
>>> shared
>>> > buff/cache available
>>> > Mem: 3056888 2895980 76212
>>> 0
>>> > 84696 21444
>>> > Swap: 4194300 3373664 820636
>>> > docker at a9e770e1dc66:~/docker_work/1M$ 12900
>>> > -bash: 12900: command not found
>>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>>> > total used free
>>> shared
>>> > buff/cache available
>>> > Mem: 3056888 2893880 69184
>>> 0
>>> > 93824 18968
>>> > Swap: 4194300 3348584 845716
>>> > docker at a9e770e1dc66:~/docker_work/1M$ 12990
>>> > -bash: 12990: command not found
>>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>>> > total used free
>>> shared
>>> > buff/cache available
>>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>>> > total used free
>>> shared
>>> > buff/cache available
>>> > Mem: 3056888 2883048 79752
>>> 0
>>> > 94088 29472
>>> > Swap: 4194300 3935380 258920
>>> > docker at a9e770e1dc66:~/docker_work/1M$ 13010
>>> > -bash: 13010: command not found
>>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>>> > total used free
>>> shared
>>> > buff/cache available
>>> > Mem: 3056888 2905660 75560
>>> 0
>>> > 75668 16104
>>> > Swap: 4194300 2024256 2170044
>>> >
>>> > The use of such a large memory chunk in such a
>>> short number
>>> > of models is what is concerning me. Should I
>>> expect this
>>> > with r11554 or is there some bug?
>>> >
>>> > Attached files re2.txt is the redirected terminal
>>> output
>>> > The photo is for model 12,000 used for the restart
>>> on my
>>> > Windows 10 Professional software environment.
>>> > I hope that is all you need.
>>> >
>>> > kind regards
>>> > Ian
>>> >
>>> >
>>> > On Sun, 17 Mar 2019 at 06:34, Evan Bauer <
>>> > ebauer at physics.ucsb.edu> wrote:
>>> >
>>> > Hi Ian,
>>> >
>>> > 11554 should be ready to go if you just “git
>>> pull” in
>>> > the MESA-docker repository to update. Let me
>>> know if
>>> > that isn’t working for you. I definitely
>>> recommend the
>>> > upgrade.
>>> >
>>> > While you’re at it, I’ll also remind you that
>>> it’s
>>> > probably a good idea to clean up your older
>>> docker
>>> > images to save hard drive space. You can
>>> remove the
>>> > image of 11532 with this command:
>>> > docker rmi evbauer/mesa_lean:11532.01
>>> >
>>> > You can also check what other older images
>>> might be
>>> > sitting around (and how much space they’re
>>> using) with
>>> > this command:
>>> > docker images
>>> >
>>> > If you’re not regularly using the older MESA
>>> versions
>>> > in those images, you should probably get rid
>>> of them
>>> > too with the “docker rmi” command.
>>> >
>>> > Cheers,
>>> > Evan
>>> >
>>> >
>>> >
>>> >
>>> > _______________________________________________
>>> > mesa-users at lists.mesastar.org
>>> >
>>> https://lists.mesastar.org/mailman/listinfo/mesa-users
>>> >
>>> >
>>> >
>>> >
>>> >
>>>
>>> > _______________________________________________
>>> > mesa-users at lists.mesastar.org
>>> > https://lists.mesastar.org/mailman/listinfo/mesa-users
>>> >
>>>
>>>
>>> --
>>> Professor Michael Ashley Department of Astrophysics
>>> University of New South Wales http://www.phys.unsw.edu.au/~mcba
>>>
>> --
>> _______________________________________________
>> mesa-users at lists.mesastar.org
>> https://lists.mesastar.org/mailman/listinfo/mesa-users
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mesastar.org/pipermail/mesa-users/attachments/20190415/6ae8a1b2/attachment.html>
More information about the Mesa-users
mailing list