Hi guys, wanted to check with you guys…
My SLES 11 SP1 got hangs suddenly, the server is running with SAP application.
When I checked all the related logs in /var/log, but couldn’t find any related to the hangs issues…
Any idea how to trace why the server hangs sudden???
Hope someone expert can give me some clues to solve this.
Is there a reason you’re on SP1 vs. SP3?
How long has this system been installed/running?
Bare metal or virtualized, and what are the specs of the system (includig
the host, if applicable) in either case?
What, specifically, does “hang” mean in your case? No more screen? No
more keyboard? Applications are stalled but the OS responds in some way?
No response to pings or TCP connection test w/netcat?
–
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…
[QUOTE=ab;17761]Is there a reason you’re on SP1 vs. SP3?
How long has this system been installed/running?
Bare metal or virtualized, and what are the specs of the system (includig
the host, if applicable) in either case?
What, specifically, does “hang” mean in your case? No more screen? No
more keyboard? Applications are stalled but the OS responds in some way?
No response to pings or TCP connection test w/netcat?
–
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…[/QUOTE]
The system has been running more than a yrs…
It is running in virtual machine under vmware product…
The host with 8 vCpu and 32 gb ram…
When the hang is happen, cannot ping, ssh, and the screen is hang. Application is down while hang…
When this happen, need to force restart the VM…
Everything will be running fine after force reboot…
Wonder why this will happen?
I got another physical machine, which is same spec as the VM,
Also happen the same case…
Is there any way I can trace whats wrong with the OS?
By the way, Hardware are healthy…
If I understand you correctly you have a VM and another bare metal system
which both have similar specs (as much as possible between physical and
virtual, perhaps), and both hang in the same way.
Do they hang at the same time? Are they hosting the same services (web,
database, etc.)? Are they used for the backends to the same frontend?
Are you able to go to a TTY at all (Ctrl+Alt+F2) in case you are in a GUI
or a locked TTY but another one works? What is the application involved?
Your best bet at this point is to try to find a way to reliably reproduce
the problem (if it happens to both systems see if something in particular
is affecting both systems in the same way) and then setup the Magic SysRQ
keys (Google it on the SUSE sites) to force a kernel dump after opening an
SR with SUSE. They can analyze the kernel dump, or maybe even provide
alternatives to test before getting to this point (including applying
public patches or internal patches for known issues that have not been
released yet) and get you on your way. The more concrete information you
have before doing to them, though, the better. They’ll ask for a
‘supportconfig’ which is a utility that is likely already on your systems.
Go ahead and run it now (as ‘root’) on both systems and you can get an
idea of what it creates, which is a big archive of text files that include
configuration and performance data about your system.
–
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…
If I understand you correctly you have a VM and another bare metal system
which both have similar specs (as much as possible between physical and
virtual, perhaps), and both hang in the same way.
Do they hang at the same time? Are they hosting the same services (web,
database, etc.)? Are they used for the backends to the same frontend?
Are you able to go to a TTY at all (Ctrl+Alt+F2) in case you are in a GUI
or a locked TTY but another one works? What is the application involved?
Your best bet at this point is to try to find a way to reliably reproduce
the problem (if it happens to both systems see if something in particular
is affecting both systems in the same way) and then setup the Magic SysRQ
keys (Google it on the SUSE sites) to force a kernel dump after opening an
SR with SUSE. They can analyze the kernel dump, or maybe even provide
alternatives to test before getting to this point (including applying
public patches or internal patches for known issues that have not been
released yet) and get you on your way. The more concrete information you
have before doing to them, though, the better. They’ll ask for a
‘supportconfig’ which is a utility that is likely already on your systems.
Go ahead and run it now (as ‘root’) on both systems and you can get an
idea of what it creates, which is a big archive of text files that include
configuration and performance data about your system.[/color]
Please see SUSE support document 3374462[1] for information on how
to configure the ‘magic SysRq’ keyboard combination to manually
trigger a kernel core dump.
One question I suspect SUSE will ask if a Service Request is
opened is the question ab asked in his first reply, why SP1 and
not the latest SP3?
[QUOTE=ab;17765]If I understand you correctly you have a VM and another bare metal system
which both have similar specs (as much as possible between physical and
virtual, perhaps), and both hang in the same way.
Do they hang at the same time? Are they hosting the same services (web,
database, etc.)? Are they used for the backends to the same frontend?
Are you able to go to a TTY at all (Ctrl+Alt+F2) in case you are in a GUI
or a locked TTY but another one works? What is the application involved?
Your best bet at this point is to try to find a way to reliably reproduce
the problem (if it happens to both systems see if something in particular
is affecting both systems in the same way) and then setup the Magic SysRQ
keys (Google it on the SUSE sites) to force a kernel dump after opening an
SR with SUSE. They can analyze the kernel dump, or maybe even provide
alternatives to test before getting to this point (including applying
public patches or internal patches for known issues that have not been
released yet) and get you on your way. The more concrete information you
have before doing to them, though, the better. They’ll ask for a
‘supportconfig’ which is a utility that is likely already on your systems.
Go ahead and run it now (as ‘root’) on both systems and you can get an
idea of what it creates, which is a big archive of text files that include
configuration and performance data about your system.
–
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…[/QUOTE]
Hi, you are right…
One is Physical another one is running in Virtual Machine…
Both servers are running SAP application…
When the hangs happen, we directly reboot the server without trying the TTY (Crtl+Alt+F2) command…
I haven’t submit the SR to SUSE support, because I not sure my customer buy any SUSE support…
I will check with them later…
Meanwhile, I will try to setup the Magic SysRQ keys that you mentioned above…
After generate the ‘supportconfig’, I need to send the data to the SUSE support team for further analyse?
Were we able to find the root cause for this sudden hang of SuSE SP1 server?
I am having the same issue. SuSE 11 SP1 server hang every couple of months. When the issue occurs we are unable to ping the server and also cannot login to the console.
We have always ended up restarting the VM.
The server is 8CPU/16G and runs as VM on VMware infrastructure. Also the server is used for SAP and run oracle database instance.
It’s been three years, so your best bet is probably to patch your system.
SP3 is no longer the latest, and SP4 has been out for quite a while.
I’ve setup a few hundred systems since then and never seen this, physical
or virtualized (KVM, Xen, and VMware-based environments) so a lot of
systems are NOT having this problem. Many of those systems had databases,
even Oracle databases (11 and 12).
–
Good luck.
If you find this post helpful and are logged into the web interface,
show your appreciation and click on the star below…