[Mesa-users] Experience with Docker Container and r 11554

Michael Ashley m.ashley at unsw.edu.au
Sun Apr 14 18:19:10 EDT 2019


Hi Ian,

Another way of obtaining the amount of free memory on a Linux system is to read the file /proc/meminfo.

If you use the "free" command, note the "-t" switch which calculates the total for you.

Regards, Michael

On Mon, Apr 15, 2019 at 07:41:28AM +1000, Ian Foley via Mesa-users wrote:
> Hi,
> 
> I thought it might be valuable for those users using MESA on limited memory
> computers for those using r 11554. My computer has 8GB RAM and I am also using
> the docker container on Windows 10 Professional. This release of MESA adds
> additional EOS data files and these can test available memory to the limit and
> can be turned off if necessary. But I wanted to use them if possible.
> 
> This release of MESA adds the inlist "num_steps_for_garbage_collection" which
> defaults to 1000 and is useful to remove EOS data which is no longer needed and
> taking up too much memory. The problem is that it can happen that the added EOS
> data files can total near to 1 GB within 100 steps. If we set garbage
> collection to occur every 100 steps it makes a significant performance hit as
> the re-allocation of large EOS data files takes time. It is also true that much
> of the evolution does not require a big jump in EOS data.
> 
> Much better would be to track free memory and activate garbage collection when
> it gets to < 1GB. The code below which can be added to run_star_extras achieves
> this and I have found it very useful.
> 
>          ! For version 10398 and onwards
>          ! For version 11554 onwards there is an inlist to do garbage
> collection. Default 1000 models.
>          ! However, this is fixed setting and will often lead to extra cost
> when doing the garbage
>          ! collection. So now we track memory and only do garbage collection
> when memory is less than
>          ! a certain minimum. This routine will free all the eos memory then
> re-initialize it,
>          ! when mesa takes the next step it will reload only the eos data mesa
> needs then.
>          ! Garbage collection code courtesy of Rob Farmer, 18 April 2018
> 
> if (mod(s% model_number,100)==0) then
>           write(*,*) 'Process id ',getpid()
>           write(*,*) 'Output from Linux free command'
>           call execute_command_line('free >memory.txt')
>           call execute_command_line('free')
>           open(100,file='memory.txt',status='old',iostat=ierr)
>           read(100,*) string3
>           read(100,'(a8,6i12)') string1,int1, int2, int3, int4, int5, int6
>           read(100,'(a7,3i12)') string2,int7, int8, int9
>           close(100)
>           free = int3 + int9
>           write(*,*) 'Model ', s% model_number
>           write(*,*) 'Total free memory = ',free
>           if (free < 1000000) then
>             write(*,*) 'Do garbage collection'
>             call eos_shutdown()
>             call eos_init(s% job% eos_file_prefix,s% job% eosDT_cache_dir,&
>                  s% job% eosPT_cache_dir, &
>                  s% job% eosDE_cache_dir, .true.,ierr)
>             call execute_command_line('free')
>           endif
>          endif
> 
> Since there is no fortran command (or other language command) to find free
> memory what we do above is to execute the Linux free command every 100 steps
> redirecting its output to a text file. Then we open the text file and parse its
> content to retrieve the amount of free memory. If its <1GB we activate garbage
> collection (using code originally from Rob Farmer - thanks).
> 
> I hope this will be useful to some
> 
> Kind regards
> Ian
> 
> 
> [uc]
> 
> 
> On Tue, 19 Mar 2019 at 09:21, Ian Foley <ifoley2008 at gmail.com> wrote:
> 
>     Thanks for your advice.
> 
>     Ian
> 
>     On Tue, 19 Mar 2019 at 8:01 am, Evan Bauer <ebauer at physics.ucsb.edu> wrote:
> 
>         Hi Ian,
> 
>         Increasing the frequency of garbage collection sounds like a good idea
>         to me, especially if your star is evolving through new EOS regions
>         quickly. There really isn’t much downside to this other than a small
>         speed hit.
> 
>         If you’re very memory constrained and want to go back to the old way of
>         doing things, you also have the option of turning off the new EOS
>         tables with
>         use_eosDT2 = .false.
>         use_eosELM = .false.
> 
>         Cheers,
>         Evan
> 
> 
> 
>             On Mar 18, 2019, at 10:46 AM, Ian Foley <ifoley2008 at gmail.com>
>             wrote:
> 
>             Thanks Rob for the detailed explanation. I will follow your
>             suggestion and check for memory leaks. btw I'm using Windows 10
>             Professional.
> 
>             I may also have to increase the frequency of garbage collection to
>             avoid a crash. 2GB is a lot more memory to need in 400 models when
>             we are a long way into the evolution and I have a limit of 8GB of
>             real memory.
> 
>             Kind regards
>             ian
>             [uc]
> 
> 
>             On Mon, 18 Mar 2019 at 20:49, Rob Farmer <r.j.farmer at uva.nl> wrote:
> 
>                 Hi,
>                 >
>                 Num EOS files loaded       13000           7           0       
>                   17          12          17
>                  Num EOS files loaded       13001           0           0     
>                     10           4          17
> 
>                 The ordering of the numbers is in line 410 in star/job/
>                 run_star_support.f90,
> 
>                 write(*,*) "Num EOS files loaded", s%model_number, num_DT,
>                 num_PT, &
>                                               num_DT2, num_PTEH, num_ELM
> 
>                 So its telling you how many of each type of eos is currently
>                 loaded into memory. Then by comparing the before and after the
>                 garbage collection we can see whether we removed any eos files.
> 
>                 So in you case we removed 7 eosDT files, 7 eosDT2 files, 8 PTEH
>                 files and no ELM or PT files. This is only meant as a
>                 diagnostic but does show that in this case you removed ~40% of
>                 the loaded eos files which should be a good memory saving.
> 
>                 >What amazed me was that between model 12460 and 12810 MESA has
>                 needed nearly 2 MB of memory! which it has had to grab from the
>                 swap space leaving less than 1MB available. That seems a huge
>                 amount over a short evolution period. (1475904 to 3373664)
> 
>                 I assume you meant GB here? What is likely happening is your
>                 model is entering a new region of parameter space so we need to
>                 load in more eos data files.
> 
>                 But to check that its not a memory leak, run the model once up
>                 to some model number and record the ~memory used at the end.
>                 Then do a restart from say a 1000 steps before the end and
>                 record its memory usage at the end. If there ~same then that is
>                 just normal mesa memory usage for this problem. If the first
>                 run uses alot more memory then we have leaked memory somewhere.
> 
>                 Also are you using the windows home (or pro?) docker container?
>                 If home, you can configure the memory it uses, if you look in
>                 the win_home_dockerMESA.sh file at the docker-machine create
>                 line you can configure the memory it has with
>                 --virtualbox-memory=2048 (in mb). You may need to delete the
>                 old virtual machine first with the utils/uninstall_win_home.sh
>                 script if you change the memory value.
> 
>                 Rob
> 
> 
>                 On Mon, 18 Mar 2019 at 04:31, Ian Foley via Mesa-users <
>                 mesa-users at lists.mesastar.org> wrote:
> 
>                     Hi Evan,
> 
>                     Thanks for setting up r11554 in the MESA-Docker container.
>                     I have deleted older versions as you suggested. Everything
>                     seems to be working well except in an inlist for a 1M model
>                     evolution it crashed in a way like running out of memory
>                     near model 13000. I've attached files I think sufficient
>                     for you to reproduce the effect.
> 
>                     Memory is cleaned up at model 12,000 and 13,000 because of
>                     the following setting. I might have been able to prevent
>                     the crash by decreasing this setting.
>                           num_steps_for_garbage_collection = 1000
>                           report_garbage_collection = .true.
>                     After the crash, I restarted the run at model 12,000 and
>                     since I modified "re" and "rn" to run star in the
>                     background, I can monitor the memory with "free". I entered
>                     the model number in the terminal so I can record when I
>                     executed "free".
> 
>                     What amazed me was that between model 12460 and 12810 MESA
>                     has needed nearly 2 MB of memory! which it has had to grab
>                     from the swap space leaving less than 1MB available. That
>                     seems a huge amount over a short evolution period. (1475904
>                     to 3373664)
> 
>                     This is the report garbage collection output at model
>                     13000. I haven't yet gone to the source code to find out
>                     what the number mean.
> 
>                      Num EOS files loaded       13000           7           0 
>                             17          12          17
>                      Num EOS files loaded       13001           0           0 
>                             10           4          17
> 
>                     Terminal output for run from model 12000 to 13010.
> 
>                     docker at a9e770e1dc66:~/docker_work/1M$ 12450
>                     -bash: 450: command not found
>                     docker at a9e770e1dc66:~/docker_work/1M$ free
>                                   total        used        free      shared 
>                     buff/cache   available
>                     Mem:        3056888     2885884       84456           0   
>                        86548       30556
>                     Swap:       4194300     1475904     2718396
>                     docker at a9e770e1dc66:~/docker_work/1M$ 12460
>                     -bash: 12460: command not found
>                     docker at a9e770e1dc66:~/docker_work/1M$ 12810
>                     -bash: 12810: command not found
>                     docker at a9e770e1dc66:~/docker_work/1M$ free
>                                   total        used        free      shared 
>                     buff/cache   available
>                     Mem:        3056888     2895980       76212           0   
>                        84696       21444
>                     Swap:       4194300     3373664      820636
>                     docker at a9e770e1dc66:~/docker_work/1M$ 12900
>                     -bash: 12900: command not found
>                     docker at a9e770e1dc66:~/docker_work/1M$ free
>                                   total        used        free      shared 
>                     buff/cache   available
>                     Mem:        3056888     2893880       69184           0   
>                        93824       18968
>                     Swap:       4194300     3348584      845716
>                     docker at a9e770e1dc66:~/docker_work/1M$ 12990
>                     -bash: 12990: command not found
>                     docker at a9e770e1dc66:~/docker_work/1M$ free
>                                   total        used        free      shared 
>                     buff/cache   available
>                     docker at a9e770e1dc66:~/docker_work/1M$ free
>                                   total        used        free      shared 
>                     buff/cache   available
>                     Mem:        3056888     2883048       79752           0   
>                        94088       29472
>                     Swap:       4194300     3935380      258920
>                     docker at a9e770e1dc66:~/docker_work/1M$ 13010
>                     -bash: 13010: command not found
>                     docker at a9e770e1dc66:~/docker_work/1M$ free
>                                   total        used        free      shared 
>                     buff/cache   available
>                     Mem:        3056888     2905660       75560           0   
>                        75668       16104
>                     Swap:       4194300     2024256     2170044
> 
>                     The use of such a large memory chunk in such a short number
>                     of models is what is concerning me. Should I expect this
>                     with r11554 or is there some bug?
> 
>                     Attached files re2.txt is the redirected terminal output
>                     The photo is for model 12,000 used for the restart on my
>                     Windows 10 Professional software environment.
>                     I hope that is all you need. 
> 
>                     kind regards
>                     Ian
> 
> 
>                     On Sun, 17 Mar 2019 at 06:34, Evan Bauer <
>                     ebauer at physics.ucsb.edu> wrote:
> 
>                         Hi Ian,
> 
>                         11554 should be ready to go if you just “git pull” in
>                         the MESA-docker repository to update. Let me know if
>                         that isn’t working for you. I definitely recommend the
>                         upgrade.
> 
>                         While you’re at it, I’ll also remind you that it’s
>                         probably a good idea to clean up your older docker
>                         images to save hard drive space. You can remove the
>                         image of 11532 with this command:
>                         docker rmi evbauer/mesa_lean:11532.01
> 
>                         You can also check what other older images might be
>                         sitting around (and how much space they’re using) with
>                         this command:
>                         docker images
> 
>                         If you’re not regularly using the older MESA versions
>                         in those images, you should probably get rid of them
>                         too with the “docker rmi” command.
> 
>                         Cheers,
>                         Evan
> 
> 
> 
> 
>                     _______________________________________________
>                     mesa-users at lists.mesastar.org
>                     https://lists.mesastar.org/mailman/listinfo/mesa-users
> 
> 
> 
> 
> 

> _______________________________________________
> mesa-users at lists.mesastar.org
> https://lists.mesastar.org/mailman/listinfo/mesa-users
> 


-- 
Professor Michael Ashley                   Department of Astrophysics
University of New South Wales       http://www.phys.unsw.edu.au/~mcba



More information about the Mesa-users mailing list