[Mesa-users] Experience with Docker Container and r 11554
Ian Foley
ifoley2008 at gmail.com
Sun Apr 14 18:45:37 EDT 2019
Thanks Michael!
Ian
On Mon, 15 Apr 2019 at 08:21, Michael Ashley <m.ashley at unsw.edu.au> wrote:
> Hi Ian,
>
> Another way of obtaining the amount of free memory on a Linux system is to
> read the file /proc/meminfo.
>
> If you use the "free" command, note the "-t" switch which calculates the
> total for you.
>
> Regards, Michael
>
> On Mon, Apr 15, 2019 at 07:41:28AM +1000, Ian Foley via Mesa-users wrote:
> > Hi,
> >
> > I thought it might be valuable for those users using MESA on limited
> memory
> > computers for those using r 11554. My computer has 8GB RAM and I am also
> using
> > the docker container on Windows 10 Professional. This release of MESA
> adds
> > additional EOS data files and these can test available memory to the
> limit and
> > can be turned off if necessary. But I wanted to use them if possible.
> >
> > This release of MESA adds the inlist "num_steps_for_garbage_collection"
> which
> > defaults to 1000 and is useful to remove EOS data which is no longer
> needed and
> > taking up too much memory. The problem is that it can happen that the
> added EOS
> > data files can total near to 1 GB within 100 steps. If we set garbage
> > collection to occur every 100 steps it makes a significant performance
> hit as
> > the re-allocation of large EOS data files takes time. It is also true
> that much
> > of the evolution does not require a big jump in EOS data.
> >
> > Much better would be to track free memory and activate garbage
> collection when
> > it gets to < 1GB. The code below which can be added to run_star_extras
> achieves
> > this and I have found it very useful.
> >
> > ! For version 10398 and onwards
> > ! For version 11554 onwards there is an inlist to do garbage
> > collection. Default 1000 models.
> > ! However, this is fixed setting and will often lead to extra
> cost
> > when doing the garbage
> > ! collection. So now we track memory and only do garbage
> collection
> > when memory is less than
> > ! a certain minimum. This routine will free all the eos memory
> then
> > re-initialize it,
> > ! when mesa takes the next step it will reload only the eos
> data mesa
> > needs then.
> > ! Garbage collection code courtesy of Rob Farmer, 18 April 2018
> >
> > if (mod(s% model_number,100)==0) then
> > write(*,*) 'Process id ',getpid()
> > write(*,*) 'Output from Linux free command'
> > call execute_command_line('free >memory.txt')
> > call execute_command_line('free')
> > open(100,file='memory.txt',status='old',iostat=ierr)
> > read(100,*) string3
> > read(100,'(a8,6i12)') string1,int1, int2, int3, int4, int5,
> int6
> > read(100,'(a7,3i12)') string2,int7, int8, int9
> > close(100)
> > free = int3 + int9
> > write(*,*) 'Model ', s% model_number
> > write(*,*) 'Total free memory = ',free
> > if (free < 1000000) then
> > write(*,*) 'Do garbage collection'
> > call eos_shutdown()
> > call eos_init(s% job% eos_file_prefix,s% job%
> eosDT_cache_dir,&
> > s% job% eosPT_cache_dir, &
> > s% job% eosDE_cache_dir, .true.,ierr)
> > call execute_command_line('free')
> > endif
> > endif
> >
> > Since there is no fortran command (or other language command) to find
> free
> > memory what we do above is to execute the Linux free command every 100
> steps
> > redirecting its output to a text file. Then we open the text file and
> parse its
> > content to retrieve the amount of free memory. If its <1GB we activate
> garbage
> > collection (using code originally from Rob Farmer - thanks).
> >
> > I hope this will be useful to some
> >
> > Kind regards
> > Ian
> >
> >
> > [uc]
> >
> >
> > On Tue, 19 Mar 2019 at 09:21, Ian Foley <ifoley2008 at gmail.com> wrote:
> >
> > Thanks for your advice.
> >
> > Ian
> >
> > On Tue, 19 Mar 2019 at 8:01 am, Evan Bauer <ebauer at physics.ucsb.edu>
> wrote:
> >
> > Hi Ian,
> >
> > Increasing the frequency of garbage collection sounds like a
> good idea
> > to me, especially if your star is evolving through new EOS
> regions
> > quickly. There really isn’t much downside to this other than a
> small
> > speed hit.
> >
> > If you’re very memory constrained and want to go back to the old
> way of
> > doing things, you also have the option of turning off the new EOS
> > tables with
> > use_eosDT2 = .false.
> > use_eosELM = .false.
> >
> > Cheers,
> > Evan
> >
> >
> >
> > On Mar 18, 2019, at 10:46 AM, Ian Foley <
> ifoley2008 at gmail.com>
> > wrote:
> >
> > Thanks Rob for the detailed explanation. I will follow your
> > suggestion and check for memory leaks. btw I'm using Windows
> 10
> > Professional.
> >
> > I may also have to increase the frequency of garbage
> collection to
> > avoid a crash. 2GB is a lot more memory to need in 400
> models when
> > we are a long way into the evolution and I have a limit of
> 8GB of
> > real memory.
> >
> > Kind regards
> > ian
> > [uc]
> >
> >
> > On Mon, 18 Mar 2019 at 20:49, Rob Farmer <r.j.farmer at uva.nl>
> wrote:
> >
> > Hi,
> > >
> > Num EOS files loaded 13000 7
> 0
> > 17 12 17
> > Num EOS files loaded 13001 0
> 0
> > 10 4 17
> >
> > The ordering of the numbers is in line 410 in star/job/
> > run_star_support.f90,
> >
> > write(*,*) "Num EOS files loaded", s%model_number,
> num_DT,
> > num_PT, &
> > num_DT2, num_PTEH, num_ELM
> >
> > So its telling you how many of each type of eos is
> currently
> > loaded into memory. Then by comparing the before and
> after the
> > garbage collection we can see whether we removed any eos
> files.
> >
> > So in you case we removed 7 eosDT files, 7 eosDT2 files,
> 8 PTEH
> > files and no ELM or PT files. This is only meant as a
> > diagnostic but does show that in this case you removed
> ~40% of
> > the loaded eos files which should be a good memory
> saving.
> >
> > >What amazed me was that between model 12460 and 12810
> MESA has
> > needed nearly 2 MB of memory! which it has had to grab
> from the
> > swap space leaving less than 1MB available. That seems a
> huge
> > amount over a short evolution period. (1475904 to
> 3373664)
> >
> > I assume you meant GB here? What is likely happening is
> your
> > model is entering a new region of parameter space so we
> need to
> > load in more eos data files.
> >
> > But to check that its not a memory leak, run the model
> once up
> > to some model number and record the ~memory used at the
> end.
> > Then do a restart from say a 1000 steps before the end
> and
> > record its memory usage at the end. If there ~same then
> that is
> > just normal mesa memory usage for this problem. If the
> first
> > run uses alot more memory then we have leaked memory
> somewhere.
> >
> > Also are you using the windows home (or pro?) docker
> container?
> > If home, you can configure the memory it uses, if you
> look in
> > the win_home_dockerMESA.sh file at the docker-machine
> create
> > line you can configure the memory it has with
> > --virtualbox-memory=2048 (in mb). You may need to delete
> the
> > old virtual machine first with the
> utils/uninstall_win_home.sh
> > script if you change the memory value.
> >
> > Rob
> >
> >
> > On Mon, 18 Mar 2019 at 04:31, Ian Foley via Mesa-users <
> > mesa-users at lists.mesastar.org> wrote:
> >
> > Hi Evan,
> >
> > Thanks for setting up r11554 in the MESA-Docker
> container.
> > I have deleted older versions as you suggested.
> Everything
> > seems to be working well except in an inlist for a
> 1M model
> > evolution it crashed in a way like running out of
> memory
> > near model 13000. I've attached files I think
> sufficient
> > for you to reproduce the effect.
> >
> > Memory is cleaned up at model 12,000 and 13,000
> because of
> > the following setting. I might have been able to
> prevent
> > the crash by decreasing this setting.
> > num_steps_for_garbage_collection = 1000
> > report_garbage_collection = .true.
> > After the crash, I restarted the run at model 12,000
> and
> > since I modified "re" and "rn" to run star in the
> > background, I can monitor the memory with "free". I
> entered
> > the model number in the terminal so I can record
> when I
> > executed "free".
> >
> > What amazed me was that between model 12460 and
> 12810 MESA
> > has needed nearly 2 MB of memory! which it has had
> to grab
> > from the swap space leaving less than 1MB available.
> That
> > seems a huge amount over a short evolution period.
> (1475904
> > to 3373664)
> >
> > This is the report garbage collection output at model
> > 13000. I haven't yet gone to the source code to find
> out
> > what the number mean.
> >
> > Num EOS files loaded 13000 7
> 0
> > 17 12 17
> > Num EOS files loaded 13001 0
> 0
> > 10 4 17
> >
> > Terminal output for run from model 12000 to 13010.
> >
> > docker at a9e770e1dc66:~/docker_work/1M$ 12450
> > -bash: 450: command not found
> > docker at a9e770e1dc66:~/docker_work/1M$ free
> > total used free
> shared
> > buff/cache available
> > Mem: 3056888 2885884 84456
> 0
> > 86548 30556
> > Swap: 4194300 1475904 2718396
> > docker at a9e770e1dc66:~/docker_work/1M$ 12460
> > -bash: 12460: command not found
> > docker at a9e770e1dc66:~/docker_work/1M$ 12810
> > -bash: 12810: command not found
> > docker at a9e770e1dc66:~/docker_work/1M$ free
> > total used free
> shared
> > buff/cache available
> > Mem: 3056888 2895980 76212
> 0
> > 84696 21444
> > Swap: 4194300 3373664 820636
> > docker at a9e770e1dc66:~/docker_work/1M$ 12900
> > -bash: 12900: command not found
> > docker at a9e770e1dc66:~/docker_work/1M$ free
> > total used free
> shared
> > buff/cache available
> > Mem: 3056888 2893880 69184
> 0
> > 93824 18968
> > Swap: 4194300 3348584 845716
> > docker at a9e770e1dc66:~/docker_work/1M$ 12990
> > -bash: 12990: command not found
> > docker at a9e770e1dc66:~/docker_work/1M$ free
> > total used free
> shared
> > buff/cache available
> > docker at a9e770e1dc66:~/docker_work/1M$ free
> > total used free
> shared
> > buff/cache available
> > Mem: 3056888 2883048 79752
> 0
> > 94088 29472
> > Swap: 4194300 3935380 258920
> > docker at a9e770e1dc66:~/docker_work/1M$ 13010
> > -bash: 13010: command not found
> > docker at a9e770e1dc66:~/docker_work/1M$ free
> > total used free
> shared
> > buff/cache available
> > Mem: 3056888 2905660 75560
> 0
> > 75668 16104
> > Swap: 4194300 2024256 2170044
> >
> > The use of such a large memory chunk in such a short
> number
> > of models is what is concerning me. Should I expect
> this
> > with r11554 or is there some bug?
> >
> > Attached files re2.txt is the redirected terminal
> output
> > The photo is for model 12,000 used for the restart
> on my
> > Windows 10 Professional software environment.
> > I hope that is all you need.
> >
> > kind regards
> > Ian
> >
> >
> > On Sun, 17 Mar 2019 at 06:34, Evan Bauer <
> > ebauer at physics.ucsb.edu> wrote:
> >
> > Hi Ian,
> >
> > 11554 should be ready to go if you just “git
> pull” in
> > the MESA-docker repository to update. Let me
> know if
> > that isn’t working for you. I definitely
> recommend the
> > upgrade.
> >
> > While you’re at it, I’ll also remind you that
> it’s
> > probably a good idea to clean up your older
> docker
> > images to save hard drive space. You can remove
> the
> > image of 11532 with this command:
> > docker rmi evbauer/mesa_lean:11532.01
> >
> > You can also check what other older images might
> be
> > sitting around (and how much space they’re
> using) with
> > this command:
> > docker images
> >
> > If you’re not regularly using the older MESA
> versions
> > in those images, you should probably get rid of
> them
> > too with the “docker rmi” command.
> >
> > Cheers,
> > Evan
> >
> >
> >
> >
> > _______________________________________________
> > mesa-users at lists.mesastar.org
> >
> https://lists.mesastar.org/mailman/listinfo/mesa-users
> >
> >
> >
> >
> >
>
> > _______________________________________________
> > mesa-users at lists.mesastar.org
> > https://lists.mesastar.org/mailman/listinfo/mesa-users
> >
>
>
> --
> Professor Michael Ashley Department of Astrophysics
> University of New South Wales http://www.phys.unsw.edu.au/~mcba
>
--
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.mesastar.org/pipermail/mesa-users/attachments/20190415/99e103a7/attachment.html>
More information about the Mesa-users
mailing list