[QUOTE=ab;40444]This is probably the right path, or at least a better one. ‘screen’ is
wonderful, but it does not help with GUI sessions when they disconnect, so
if your application really does need a GUI for some reason (which just
baffles me), then screen will probably not help. I can understand a
high-performance application wanting GPU resources, but not the GUI itself.
I do not remember if you ever said, but are you using Gnome as part of
your X session, or have you tried running X without Gnome in case it is
somehow interfering with the memory usage?
Again, if there is any way to get this software, and a job, and then to
run it on a system of mine, I would be willing to give it a shot to see if
it can be reproduced to see if a bug can be opened, or at least the issue
analyzed a bit more on systems that are not as important as yours.
–
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below.
If you want to send me a private message, please let me know in the
forum as I do not use the web interface often.[/QUOTE]
The analysis environment is a tightly integrated GUI for the setup of the analysis (which can include multi-physics runs where the results from one solver for one domain is then “connected” as being the input to the solution of another physics domain.
So…think of like the deflection of an airplane wing as a result of the aerodynamic loading for example. The aerodynamic loading would be solved using computational fluid dynamics (CFD) and the deflection of the wing due to that aerodynamic loading would be solved using a finite element analysis (FEA) solver.
The transfer of data from the CFD results and mapped onto the FEA is handled via the GUI.
That is an example of how and why the GUI is and becomes necessary.
I actually tried launching the analyses using command-line solvers only and because the GUI also has a file manager to make sure that everything is sync’d properly and consistency is maintained throughout the project – solving those domains using the command-line only solver actually causes what they said is a “type 3 file error” that puts the file/data in an out-of-sync/out-of-date mode with respect to the GUI’s file manager for the entire project. (And there isn’t a way to re-syncronize and reconcile the results with the file manager, unfortunately. That’s just how they have developed their software. Works great for Windows since it is a persistently graphical environment that DOESN’T have this kind of memory consumption issue with the underlying piece of software that runs/operates said graphical environment.)
(The test that I am trying to conduct now with the cluster is that I use the head node (which has the GUI) and launches the solvers on each of the slave/compute nodes such that I can run the slave/compute nodes in runlevel 3 so that the slave/compute nodes will no longer have a GUI/X running anymore.)
(The GUI has it’s own clustering/job submission/job management software built in so I am testing that and seeing what’s possible and what isn’t possible with it.)
There are other CAE applications (like NASTRAN/LS-DYNA/Fluent, etc.) that doesn’t have this type of GUI dependency, but it also means that if you want to solve, say the kind of aerodynamic wing deflection problem that I described above, getting that set up is SIGNIFICANTLY more complicated than using this other analysis system where all of the piece are integrated together via the GUI.
Hopefully that helps to clarify why the GUI is important and a necessary element.
(Also for parametric studies/series of analysis runs - that too is also entirely handled via the analysis application GUI as well. If I were to do it via the command line, instead of it automatically setting up all of the runs for the parametric study, I will have to set those up manually myself one at a time, and then write a shell batch processing script to kick the runs off, which, of course, because shells aren’t job schedulers, has limitations in terms of how many simulations can be initiated/running concurrently if each run asks for n number of CPUs such that it is less than the maximum number of CPU/cores installed in the system. (e.g. my node has two physical processors installed, each processor has 8 physical cores (HTT is disabled), for a total of 16 cores. So I can run 4 jobs of 4 CPUs each, which shell scripting has limited abilities to be able to do something like that.)
The GUI on the other hand, via it’s built in clustering/job submission/job scheduler can submit the four runs of four CPUs each for as many runs as there are in my parametric study. So again, hopefully, this will help highlight why the GUI is important.
I am running GNOME. I have not tried running X [i]sans[/b] GNOME only because I don’t know how to do that/set that up.
In googling “Xorg high memory usage”, it looks like that I am not the only person who is asking about it, but it also does appear that I am one of the few people who is asking about it AT THIS LEVEL (e.g. where X is consuming 100 GiB of RAM out of 125 GiB installed).
In researching that, I found this:
Option "XaaNoPixmapCache"
Disables caching of patterns in offscreen video memory.
(Source: https://www.x.org/releases/current/doc/man/man5/xorg.conf.5.xhtml)
So on one of the sources, it says to set the limit in terms of how much caching X can do which is set using ulimit -m (which I still don’t know exactly how to do that) and in this xorg.conf, looks like that there is an option for me to disable caching altogether.
(I’ve recently joined x.org’s mailing list, so I am going to ask them about it.)
But as suggested, I can try running Xvfb (which I can do so I think via cygwin/X, correct?) and/or trying to run X without GNOME (I’ll have to research the detailed instructions of how to actually make that happen) to see if either and/or both would help (such that I will keep the GUI, but it just won’t take up so much memory).
The problem with me releasing the model is that it contains some proprietary information and I don’t have a way of “sanitizing” the model that is causing it. (It’s a benchmark model that I have that I use persistently for hardware and software evaluation that is a “live” model which is why I am unable to release it and/or details about it and how I am running it.)
Yes, I realise that knowing that may help, but unfortunately, my hands are literally tied. I hope you understand.
(P.S. The information that you guys have been able to provide thus far even without it has been helpful in trying to figure out what’s going on with this, so I want to thank everybody who’s contributed thus far.)
Thanks.
P.S. If there are more specific instructions on either running X without GNOME and/or Xvfb (via cygwin/X), that would be greatly appreciated.
I am WAYYY in over my head with this sysadmin stuff as I am only learning as much as it is necessary to facilitate my ability and my system’s ability to run these analyses.
Thank you for all of your help.
Treat me like the dummy that I am so the more explicit the instructions, the better.
Thanks.