[Mesa-users] Experience with Docker Container and r 11554
Rob Farmer
r.j.farmer at uva.nl
Mon Apr 15 03:47:40 EDT 2019
Hi Ian,
Thanks for the suggestion, while i think about what is the best way for
MESA to do this, you would probably want to consider Micheal's suggestion
of reading /proc/meminfo over using execute_command_line(). The issue with
execute_command_line is that it calls fork() which can lead to memory
exhaustion, depending on how much you can overcommit your memory. See some
of the previous mesa-user issues when we called the shell's mv and mkdir
via execute_command_line. As such we stopped using execute_command_line()
in MESA and switched to a c-based syscall interface to call mv and mkdir.
Rob
On Mon, 15 Apr 2019 at 00:45, Ian Foley via Mesa-users <
mesa-users at lists.mesastar.org> wrote:
> Thanks Michael!
>
> Ian
>
>
> On Mon, 15 Apr 2019 at 08:21, Michael Ashley <m.ashley at unsw.edu.au> wrote:
>
>> Hi Ian,
>>
>> Another way of obtaining the amount of free memory on a Linux system is
>> to read the file /proc/meminfo.
>>
>> If you use the "free" command, note the "-t" switch which calculates the
>> total for you.
>>
>> Regards, Michael
>>
>> On Mon, Apr 15, 2019 at 07:41:28AM +1000, Ian Foley via Mesa-users wrote:
>> > Hi,
>> >
>> > I thought it might be valuable for those users using MESA on limited
>> memory
>> > computers for those using r 11554. My computer has 8GB RAM and I am
>> also using
>> > the docker container on Windows 10 Professional. This release of MESA
>> adds
>> > additional EOS data files and these can test available memory to the
>> limit and
>> > can be turned off if necessary. But I wanted to use them if possible.
>> >
>> > This release of MESA adds the inlist "num_steps_for_garbage_collection"
>> which
>> > defaults to 1000 and is useful to remove EOS data which is no longer
>> needed and
>> > taking up too much memory. The problem is that it can happen that the
>> added EOS
>> > data files can total near to 1 GB within 100 steps. If we set garbage
>> > collection to occur every 100 steps it makes a significant performance
>> hit as
>> > the re-allocation of large EOS data files takes time. It is also true
>> that much
>> > of the evolution does not require a big jump in EOS data.
>> >
>> > Much better would be to track free memory and activate garbage
>> collection when
>> > it gets to < 1GB. The code below which can be added to run_star_extras
>> achieves
>> > this and I have found it very useful.
>> >
>> > ! For version 10398 and onwards
>> > ! For version 11554 onwards there is an inlist to do garbage
>> > collection. Default 1000 models.
>> > ! However, this is fixed setting and will often lead to extra
>> cost
>> > when doing the garbage
>> > ! collection. So now we track memory and only do garbage
>> collection
>> > when memory is less than
>> > ! a certain minimum. This routine will free all the eos memory
>> then
>> > re-initialize it,
>> > ! when mesa takes the next step it will reload only the eos
>> data mesa
>> > needs then.
>> > ! Garbage collection code courtesy of Rob Farmer, 18 April 2018
>> >
>> > if (mod(s% model_number,100)==0) then
>> > write(*,*) 'Process id ',getpid()
>> > write(*,*) 'Output from Linux free command'
>> > call execute_command_line('free >memory.txt')
>> > call execute_command_line('free')
>> > open(100,file='memory.txt',status='old',iostat=ierr)
>> > read(100,*) string3
>> > read(100,'(a8,6i12)') string1,int1, int2, int3, int4, int5,
>> int6
>> > read(100,'(a7,3i12)') string2,int7, int8, int9
>> > close(100)
>> > free = int3 + int9
>> > write(*,*) 'Model ', s% model_number
>> > write(*,*) 'Total free memory = ',free
>> > if (free < 1000000) then
>> > write(*,*) 'Do garbage collection'
>> > call eos_shutdown()
>> > call eos_init(s% job% eos_file_prefix,s% job%
>> eosDT_cache_dir,&
>> > s% job% eosPT_cache_dir, &
>> > s% job% eosDE_cache_dir, .true.,ierr)
>> > call execute_command_line('free')
>> > endif
>> > endif
>> >
>> > Since there is no fortran command (or other language command) to find
>> free
>> > memory what we do above is to execute the Linux free command every 100
>> steps
>> > redirecting its output to a text file. Then we open the text file and
>> parse its
>> > content to retrieve the amount of free memory. If its <1GB we activate
>> garbage
>> > collection (using code originally from Rob Farmer - thanks).
>> >
>> > I hope this will be useful to some
>> >
>> > Kind regards
>> > Ian
>> >
>> >
>> > [uc]
>> >
>> >
>> > On Tue, 19 Mar 2019 at 09:21, Ian Foley <ifoley2008 at gmail.com> wrote:
>> >
>> > Thanks for your advice.
>> >
>> > Ian
>> >
>> > On Tue, 19 Mar 2019 at 8:01 am, Evan Bauer <ebauer at physics.ucsb.edu>
>> wrote:
>> >
>> > Hi Ian,
>> >
>> > Increasing the frequency of garbage collection sounds like a
>> good idea
>> > to me, especially if your star is evolving through new EOS
>> regions
>> > quickly. There really isn’t much downside to this other than a
>> small
>> > speed hit.
>> >
>> > If you’re very memory constrained and want to go back to the
>> old way of
>> > doing things, you also have the option of turning off the new
>> EOS
>> > tables with
>> > use_eosDT2 = .false.
>> > use_eosELM = .false.
>> >
>> > Cheers,
>> > Evan
>> >
>> >
>> >
>> > On Mar 18, 2019, at 10:46 AM, Ian Foley <
>> ifoley2008 at gmail.com>
>> > wrote:
>> >
>> > Thanks Rob for the detailed explanation. I will follow your
>> > suggestion and check for memory leaks. btw I'm using
>> Windows 10
>> > Professional.
>> >
>> > I may also have to increase the frequency of garbage
>> collection to
>> > avoid a crash. 2GB is a lot more memory to need in 400
>> models when
>> > we are a long way into the evolution and I have a limit of
>> 8GB of
>> > real memory.
>> >
>> > Kind regards
>> > ian
>> > [uc]
>> >
>> >
>> > On Mon, 18 Mar 2019 at 20:49, Rob Farmer <r.j.farmer at uva.nl>
>> wrote:
>> >
>> > Hi,
>> > >
>> > Num EOS files loaded 13000 7
>> 0
>> > 17 12 17
>> > Num EOS files loaded 13001 0
>> 0
>> > 10 4 17
>> >
>> > The ordering of the numbers is in line 410 in star/job/
>> > run_star_support.f90,
>> >
>> > write(*,*) "Num EOS files loaded", s%model_number,
>> num_DT,
>> > num_PT, &
>> > num_DT2, num_PTEH, num_ELM
>> >
>> > So its telling you how many of each type of eos is
>> currently
>> > loaded into memory. Then by comparing the before and
>> after the
>> > garbage collection we can see whether we removed any
>> eos files.
>> >
>> > So in you case we removed 7 eosDT files, 7 eosDT2
>> files, 8 PTEH
>> > files and no ELM or PT files. This is only meant as a
>> > diagnostic but does show that in this case you removed
>> ~40% of
>> > the loaded eos files which should be a good memory
>> saving.
>> >
>> > >What amazed me was that between model 12460 and 12810
>> MESA has
>> > needed nearly 2 MB of memory! which it has had to grab
>> from the
>> > swap space leaving less than 1MB available. That seems
>> a huge
>> > amount over a short evolution period. (1475904 to
>> 3373664)
>> >
>> > I assume you meant GB here? What is likely happening is
>> your
>> > model is entering a new region of parameter space so we
>> need to
>> > load in more eos data files.
>> >
>> > But to check that its not a memory leak, run the model
>> once up
>> > to some model number and record the ~memory used at the
>> end.
>> > Then do a restart from say a 1000 steps before the end
>> and
>> > record its memory usage at the end. If there ~same then
>> that is
>> > just normal mesa memory usage for this problem. If the
>> first
>> > run uses alot more memory then we have leaked memory
>> somewhere.
>> >
>> > Also are you using the windows home (or pro?) docker
>> container?
>> > If home, you can configure the memory it uses, if you
>> look in
>> > the win_home_dockerMESA.sh file at the docker-machine
>> create
>> > line you can configure the memory it has with
>> > --virtualbox-memory=2048 (in mb). You may need to
>> delete the
>> > old virtual machine first with the
>> utils/uninstall_win_home.sh
>> > script if you change the memory value.
>> >
>> > Rob
>> >
>> >
>> > On Mon, 18 Mar 2019 at 04:31, Ian Foley via Mesa-users <
>> > mesa-users at lists.mesastar.org> wrote:
>> >
>> > Hi Evan,
>> >
>> > Thanks for setting up r11554 in the MESA-Docker
>> container.
>> > I have deleted older versions as you suggested.
>> Everything
>> > seems to be working well except in an inlist for a
>> 1M model
>> > evolution it crashed in a way like running out of
>> memory
>> > near model 13000. I've attached files I think
>> sufficient
>> > for you to reproduce the effect.
>> >
>> > Memory is cleaned up at model 12,000 and 13,000
>> because of
>> > the following setting. I might have been able to
>> prevent
>> > the crash by decreasing this setting.
>> > num_steps_for_garbage_collection = 1000
>> > report_garbage_collection = .true.
>> > After the crash, I restarted the run at model
>> 12,000 and
>> > since I modified "re" and "rn" to run star in the
>> > background, I can monitor the memory with "free". I
>> entered
>> > the model number in the terminal so I can record
>> when I
>> > executed "free".
>> >
>> > What amazed me was that between model 12460 and
>> 12810 MESA
>> > has needed nearly 2 MB of memory! which it has had
>> to grab
>> > from the swap space leaving less than 1MB
>> available. That
>> > seems a huge amount over a short evolution period.
>> (1475904
>> > to 3373664)
>> >
>> > This is the report garbage collection output at
>> model
>> > 13000. I haven't yet gone to the source code to
>> find out
>> > what the number mean.
>> >
>> > Num EOS files loaded 13000 7
>> 0
>> > 17 12 17
>> > Num EOS files loaded 13001 0
>> 0
>> > 10 4 17
>> >
>> > Terminal output for run from model 12000 to 13010.
>> >
>> > docker at a9e770e1dc66:~/docker_work/1M$ 12450
>> > -bash: 450: command not found
>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>> > total used free
>> shared
>> > buff/cache available
>> > Mem: 3056888 2885884 84456
>> 0
>> > 86548 30556
>> > Swap: 4194300 1475904 2718396
>> > docker at a9e770e1dc66:~/docker_work/1M$ 12460
>> > -bash: 12460: command not found
>> > docker at a9e770e1dc66:~/docker_work/1M$ 12810
>> > -bash: 12810: command not found
>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>> > total used free
>> shared
>> > buff/cache available
>> > Mem: 3056888 2895980 76212
>> 0
>> > 84696 21444
>> > Swap: 4194300 3373664 820636
>> > docker at a9e770e1dc66:~/docker_work/1M$ 12900
>> > -bash: 12900: command not found
>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>> > total used free
>> shared
>> > buff/cache available
>> > Mem: 3056888 2893880 69184
>> 0
>> > 93824 18968
>> > Swap: 4194300 3348584 845716
>> > docker at a9e770e1dc66:~/docker_work/1M$ 12990
>> > -bash: 12990: command not found
>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>> > total used free
>> shared
>> > buff/cache available
>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>> > total used free
>> shared
>> > buff/cache available
>> > Mem: 3056888 2883048 79752
>> 0
>> > 94088 29472
>> > Swap: 4194300 3935380 258920
>> > docker at a9e770e1dc66:~/docker_work/1M$ 13010
>> > -bash: 13010: command not found
>> > docker at a9e770e1dc66:~/docker_work/1M$ free
>> > total used free
>> shared
>> > buff/cache available
>> > Mem: 3056888 2905660 75560
>> 0
>> > 75668 16104
>> > Swap: 4194300 2024256 2170044
>> >
>> > The use of such a large memory chunk in such a
>> short number
>> > of models is what is concerning me. Should I expect
>> this
>> > with r11554 or is there some bug?
>> >
>> > Attached files re2.txt is the redirected terminal
>> output
>> > The photo is for model 12,000 used for the restart
>> on my
>> > Windows 10 Professional software environment.
>> > I hope that is all you need.
>> >
>> > kind regards
>> > Ian
>> >
>> >
>> > On Sun, 17 Mar 2019 at 06:34, Evan Bauer <
>> > ebauer at physics.ucsb.edu> wrote:
>> >
>> > Hi Ian,
>> >
>> > 11554 should be ready to go if you just “git
>> pull” in
>> > the MESA-docker repository to update. Let me
>> know if
>> > that isn’t working for you. I definitely
>> recommend the
>> > upgrade.
>> >
>> > While you’re at it, I’ll also remind you that
>> it’s
>> > probably a good idea to clean up your older
>> docker
>> > images to save hard drive space. You can remove
>> the
>> > image of 11532 with this command:
>> > docker rmi evbauer/mesa_lean:11532.01
>> >
>> > You can also check what other older images
>> might be
>> > sitting around (and how much space they’re
>> using) with
>> > this command:
>> > docker images
>> >
>> > If you’re not regularly using the older MESA
>> versions
>> > in those images, you should probably get rid of
>> them
>> > too with the “docker rmi” command.
>> >
>> > Cheers,
>> > Evan
>> >
>> >
>> >
>> >
>> > _______________________________________________
>> > mesa-users at lists.mesastar.org
>> >
>> https://lists.mesastar.org/mailman/listinfo/mesa-users
>> >
>> >
>> >
>> >
>> >
>>
>> > _______________________________________________
>> > mesa-users at lists.mesastar.org
>> > https://lists.mesastar.org/mailman/listinfo/mesa-users
>> >
>>
>>
>> --
>> Professor Michael Ashley Department of Astrophysics
>> University of New South Wales http://www.phys.unsw.edu.au/~mcba
>>
> --
> _______________________________________________
> mesa-users at lists.mesastar.org
> https://lists.mesastar.org/mailman/listinfo/mesa-users
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mesastar.org/pipermail/mesa-users/attachments/20190415/c8fc0e55/attachment.html>
More information about the Mesa-users
mailing list