memory leak issue?

alpha754293 · December 5, 2017, 5:07am

So in doing the clustering tests, I’ve discovered that the CAE application that I am using - because of the way that the various physics (in the multi-physics simulations) are integrated together via the GUI application, therefore; it would appear that having a GUI is actually quite to very important.

I’ll have to look into that screen command more now in part because of it.

Having said that, trying to run the solver on the head node (with one slave/compute node connected to it) via command line works, but creates what the software vendor calls a “type 3” error state with the file management system and the way that everything is integrated via the GUI.

So…there’s that.

alpha754293 · December 5, 2017, 5:20am

edit
So I looked into the screen command and it works, as you mentioned for anything that runs from the command line.

Based on what I have recently learned about the dependency of the GUI, I would actually have to find a way to make run remotely (so that it doesn’t take up so much RAM on the console) or figure out a way to better limit the amount of RAM (or turn off X’s caching altogether) in order to help it run better (or at least it would stop taking up so much RAM over time).

If I can’t do that, then unfortunately, I’ll have to switch back to Windows.

malcolmlewis · December 5, 2017, 6:13am

Hi
What about looking at Xvfb, or give the SLE Beta 15 a whirl with (X)wayland?

alpha754293 · December 5, 2017, 3:12pm

[QUOTE=malcolmlewis;40429]Hi
What about looking at Xvfb, or give the SLE Beta 15 a whirl with (X)wayland?[/QUOTE]

I haven’t tried looking at Xvfb only because this is new to me.

SLE Beta 15 is a no-go because the software isn’t certified to run on it (it doesn’t even run on SLES12 SP2). (Tried that before - and there were library dependency failures that I have no interest in chasing because it takes time away from me doing ACTUAL work by performing ACTUAL analyses.)

But I’ll look into that Xvfb to see if that might be something that can help.

Thanks.

ab1 · December 5, 2017, 6:24pm

This is probably the right path, or at least a better one. ‘screen’ is
wonderful, but it does not help with GUI sessions when they disconnect, so
if your application really does need a GUI for some reason (which just
baffles me), then screen will probably not help. I can understand a
high-performance application wanting GPU resources, but not the GUI itself.

I do not remember if you ever said, but are you using Gnome as part of
your X session, or have you tried running X without Gnome in case it is
somehow interfering with the memory usage?

Again, if there is any way to get this software, and a job, and then to
run it on a system of mine, I would be willing to give it a shot to see if
it can be reproduced to see if a bug can be opened, or at least the issue
analyzed a bit more on systems that are not as important as yours.

–
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.

alpha754293 · December 5, 2017, 7:44pm

[QUOTE=ab;40444]This is probably the right path, or at least a better one. ‘screen’ is
wonderful, but it does not help with GUI sessions when they disconnect, so
if your application really does need a GUI for some reason (which just
baffles me), then screen will probably not help. I can understand a
high-performance application wanting GPU resources, but not the GUI itself.

I do not remember if you ever said, but are you using Gnome as part of
your X session, or have you tried running X without Gnome in case it is
somehow interfering with the memory usage?

Again, if there is any way to get this software, and a job, and then to
run it on a system of mine, I would be willing to give it a shot to see if
it can be reproduced to see if a bug can be opened, or at least the issue
analyzed a bit more on systems that are not as important as yours.

–
Good luck.

If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.

If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.[/QUOTE]

The analysis environment is a tightly integrated GUI for the setup of the analysis (which can include multi-physics runs where the results from one solver for one domain is then “connected” as being the input to the solution of another physics domain.

So…think of like the deflection of an airplane wing as a result of the aerodynamic loading for example. The aerodynamic loading would be solved using computational fluid dynamics (CFD) and the deflection of the wing due to that aerodynamic loading would be solved using a finite element analysis (FEA) solver.

The transfer of data from the CFD results and mapped onto the FEA is handled via the GUI.

That is an example of how and why the GUI is and becomes necessary.

I actually tried launching the analyses using command-line solvers only and because the GUI also has a file manager to make sure that everything is sync’d properly and consistency is maintained throughout the project – solving those domains using the command-line only solver actually causes what they said is a “type 3 file error” that puts the file/data in an out-of-sync/out-of-date mode with respect to the GUI’s file manager for the entire project. (And there isn’t a way to re-syncronize and reconcile the results with the file manager, unfortunately. That’s just how they have developed their software. Works great for Windows since it is a persistently graphical environment that DOESN’T have this kind of memory consumption issue with the underlying piece of software that runs/operates said graphical environment.)

(The test that I am trying to conduct now with the cluster is that I use the head node (which has the GUI) and launches the solvers on each of the slave/compute nodes such that I can run the slave/compute nodes in runlevel 3 so that the slave/compute nodes will no longer have a GUI/X running anymore.)

(The GUI has it’s own clustering/job submission/job management software built in so I am testing that and seeing what’s possible and what isn’t possible with it.)

There are other CAE applications (like NASTRAN/LS-DYNA/Fluent, etc.) that doesn’t have this type of GUI dependency, but it also means that if you want to solve, say the kind of aerodynamic wing deflection problem that I described above, getting that set up is SIGNIFICANTLY more complicated than using this other analysis system where all of the piece are integrated together via the GUI.

Hopefully that helps to clarify why the GUI is important and a necessary element.

(Also for parametric studies/series of analysis runs - that too is also entirely handled via the analysis application GUI as well. If I were to do it via the command line, instead of it automatically setting up all of the runs for the parametric study, I will have to set those up manually myself one at a time, and then write a shell batch processing script to kick the runs off, which, of course, because shells aren’t job schedulers, has limitations in terms of how many simulations can be initiated/running concurrently if each run asks for n number of CPUs such that it is less than the maximum number of CPU/cores installed in the system. (e.g. my node has two physical processors installed, each processor has 8 physical cores (HTT is disabled), for a total of 16 cores. So I can run 4 jobs of 4 CPUs each, which shell scripting has limited abilities to be able to do something like that.)

The GUI on the other hand, via it’s built in clustering/job submission/job scheduler can submit the four runs of four CPUs each for as many runs as there are in my parametric study. So again, hopefully, this will help highlight why the GUI is important.

I am running GNOME. I have not tried running X [i]sans[/b] GNOME only because I don’t know how to do that/set that up.

In googling “Xorg high memory usage”, it looks like that I am not the only person who is asking about it, but it also does appear that I am one of the few people who is asking about it AT THIS LEVEL (e.g. where X is consuming 100 GiB of RAM out of 125 GiB installed).

In researching that, I found this:

Option "XaaNoPixmapCache" Disables caching of patterns in offscreen video memory.

(Source: https://www.x.org/releases/current/doc/man/man5/xorg.conf.5.xhtml)

So on one of the sources, it says to set the limit in terms of how much caching X can do which is set using ulimit -m (which I still don’t know exactly how to do that) and in this xorg.conf, looks like that there is an option for me to disable caching altogether.

(I’ve recently joined x.org’s mailing list, so I am going to ask them about it.)

But as suggested, I can try running Xvfb (which I can do so I think via cygwin/X, correct?) and/or trying to run X without GNOME (I’ll have to research the detailed instructions of how to actually make that happen) to see if either and/or both would help (such that I will keep the GUI, but it just won’t take up so much memory).

The problem with me releasing the model is that it contains some proprietary information and I don’t have a way of “sanitizing” the model that is causing it. (It’s a benchmark model that I have that I use persistently for hardware and software evaluation that is a “live” model which is why I am unable to release it and/or details about it and how I am running it.)

Yes, I realise that knowing that may help, but unfortunately, my hands are literally tied. I hope you understand.

(P.S. The information that you guys have been able to provide thus far even without it has been helpful in trying to figure out what’s going on with this, so I want to thank everybody who’s contributed thus far.)

Thanks.

P.S. If there are more specific instructions on either running X without GNOME and/or Xvfb (via cygwin/X), that would be greatly appreciated.

I am WAYYY in over my head with this sysadmin stuff as I am only learning as much as it is necessary to facilitate my ability and my system’s ability to run these analyses.

Thank you for all of your help.

Treat me like the dummy that I am so the more explicit the instructions, the better.

Thanks.

alpha754293 · December 5, 2017, 8:55pm

edit

Following the instructions from this:

https://www.redwireservices.com/remote-x11-for-linux-unix

I think that I have remote X11 working with cygwin/X over ssh.

I am still waiting for that node to finish it’s last analysis/computation/run (on the console itself) before I try the whole thing again, but this time from the remote X11 instead to see if it will have a problem with it.

Thanks.

Topic		Replies	Views
View the memory inconsistency SLES Configure-Administer	1	186	May 6, 2014
Linux meminfo very high slab SLES Configure-Administer	1	272	May 4, 2014
Memory use on SLES? SLES Configure-Administer	1	191	November 12, 2013
SUSE 11 Memory Leak SUSE Linux Enterprise Server	20	632	November 16, 2023
Very long time required to allocate memory SLES Configure-Administer	13	213	February 8, 2013

memory leak issue?

Related topics