comprehensive monitoring tool for SLES

Hi all,

We have about 50 SLES servers running in our environment and are looking for some sort of comprehensive monitoring tool that can run on each server that has notification abilities. We would like for this tool to be able to monitor items such as disk space, memory utilization, and processor utilization and then send a message (via postfix presumably) if certain thresholds were reached. Obviously command line tools such as Top and Free and IOStat are too simplistic for this.

Does anyone have any suggestions?

Jim Gosney
Genesco

Hi
Can’t beat Nagios (part of SLE), then deploy nrpe on the systems. Else
the next best bet is snmp. The other useful tool is called sec (only
because I’m the packager/maintainer on openSUSE) which can monitor
anything you desire once it can be logged.


Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE 13.1 (Bottle) (x86_64) GNOME 3.10.2 Kernel 3.11.10-7-desktop
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!

Thanks Malcolm. I think we’re going to try SNMP first since that’s already installed. I’ve just got to figure out now how to configure it. :slight_smile: 20 years in I/T and I’ve never used SNMP.

Hi
It’s not too bad :wink: Just have a look at the nam page for snmpconf

man snmpconf

Only been 9 years here since I last used it in earnest…


Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE 13.1 (Bottle) (x86_64) GNOME 3.10.2 Kernel 3.11.10-7-desktop
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!

Hi Jim,

I think we’re going to try SNMP first since that’s already installed.

the question should be less “which tool”, rather “what do I need to do with it”.

SNMP, in terms of “already installed” is an agent on the server, comparable to NRPE from the Nagios family or many other agents out there. In your initial post you asked for a tool with “notification capabilities” - while the SNMP agent basically has such capability, it usually is not more than sending a “trap” (an SNMP packet that is sent event-driven) and you’ll need a “manager” that will both catch that packet and report it to you in a fashion matching your requirements.

Out of 30 years of systems management experience, I strongly suggest that you first define what you want to monitor and how you’d like to react to a violation of your self-defined monitoring limits. Only then you’ll be able to identify a tool that does what you require and is reasonable both in terms of TCA and TCO.

If your monitoring needs currently boil down to “availability”, “enough remaining disk space” and “enough performance”, the questions you should ask yourself are

  • Can I run with central monitoring and notification (versus multi-tier solutions with remote office mid-level managers)?
  • What notification mechanisms do I need to support (email, SMS, online UI,…)?
  • Do I need escalation mechanisms (to handle situations where no-one reacts to sent notifications)?
  • Do I need historic data (versus plain notifications on limit violations)?
  • How does all this play together with existing management tools and/or other systems’ monitoring requirements (so you don’t create a myriad of non-cooperating tool chains in your IT)?

Then you’d have a basic starting point for selecting a suitable tool.

Regards,
Jens

[QUOTE=jgosney;19468]Hi all,

We have about 50 SLES servers running in our environment and are looking for some sort of comprehensive monitoring tool that can run on each server that has notification abilities. We would like for this tool to be able to monitor items such as disk space, memory utilization, and processor utilization and then send a message (via postfix presumably) if certain thresholds were reached. Obviously command line tools such as Top and Free and IOStat are too simplistic for this.

Does anyone have any suggestions?

Jim Gosney
Genesco[/QUOTE]

Hi Jim

I have used during my career several monitoring tools. One of the most complete is Nagios, but it is very difficult to manage. For some time, I’m using Pandora FMS. You can install software agents on each of the hosts and monitor all everything you want. It’s very easy. Also if you have network equipment supports SNMP, ICMP, TCP, WMI protocols …
If you have doubts that use monitoring tool I recommend you take a look at PandoraFMS. It’s opensource and unlimited.
www.pandorafms.com

Regards

Hi
An interesting product, besides having to pay for support (Nagios
support is provided in SLES subscription), there appears to be code
modifications to some of the original sources with no notes on what was
done?

Now, it so happens that some of the open source applications used appear
to be developed on the Open Build Service (as well as sourceforge) here;
https://build.opensuse.org/project/show/home:slerena

One particular package I note from the logs was originally copied from
my build at;
https://build.opensuse.org/package/show/home:malcolmlewis:Miscellanous/xprobe2
a number of years ago and comparing the sources show items deleted and
code modifications to the pristine source, which IMHO is disturbing
https://build.opensuse.org/package/show?project=home%3Aslerena&package=xprobe2

We have the original source;

ls -la xprobe2-0.3.tar.gz 533636 Mar 26 12:39 xprobe2-0.3.tar.gz
md5sum xprobe2-0.3.tar.gz
3ebb89ed9380038d368327816e34ec54  xprobe2-0.3.tar.gz

And the development source;

ls -la xprobe2-0.3.tar.gz 372590 Mar 26 12:42 xprobe2-0.3.tar.gz
md5sum xprobe2-0.3.tar.gz
862885e06b65100dca52a57cd8e51f93  xprobe2-0.3.tar.gz

This begs the question as to what other undocumented modifications have
been done on the other open source packages included with your product?

In light of security vulnerabilities etc out in the wild in this
current day and age I would have to consider a full security review of
the products code before looking to implement and in light of this
would rather spend my effort configuring a supported, well known and
secure product such as Nagios for my needs.


Cheers Malcolm °¿° SUSE Knowledge Partner (Linux Counter #276890)
openSUSE 13.1 (Bottle) (x86_64) GNOME 3.10.1 Kernel 3.11.10-7-desktop
If you find this post helpful and are logged into the web interface,
please show your appreciation and click on the star below… Thanks!