SCOM 2016 – What’s New UNIX/Linux Series: Monitors and Rules Running (Any) Script e.g. Perl

One new feature I am very excited about is to run any sort of script on the UNIX/Linux agent. In SCOM 2012 R2 you had the option to run shell commands for performance rules and monitors. In SCOM 2012 R2 the monitor dialog looks like this…

image

…and the rules wizard shows options for creating shell command based alert and performance rules…

image

The problem was, that you were restricted to “one-liner” command which executed either the full command or you used the command to execute a script on the host. Now, in SCOM 2016 the awesome news are, that you are able to put any sort of UNIX/Linux scripts into your monitors and rules. The new wizard for monitors looks like this…image

…and the additional script options for alert and performance rules…image

As you can see,  we got these new options:

  • UNIX/Linux Script Three State Monitor
  • UNIX/Linux Script Two State Monitor
  • UNIX/Linux Script (Alert) Rule
  • UNIX/Linux Script (Performance) Rule

I think this a really awesome step for SCOM. In the past I had a few cases where I would have needed such new capabilities. How does it work? Let’s see…

UNIX/Linux Script Three State Monitor

I created a UNIX/Linux Script Three State Monitor, target Linux Computer class and I leave it enabled for demo purposes…

image

…I choose to run every 5 minutes…

image

Next, I paste a Perl script which pings a host DC01.masta.ad and returns the time in milliseconds…

image

…the full script looks like this…

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Net::Ping;

my $p=Net::Ping->new();
my ($ret,$duration,$ip)=$p->ping(‘DC01.masta.ad’);
say int($duration);

Make sure the account executing the script has enough permission to do so. Next, I will map the result returned to the error, warning and healthy state. Error if the response time is greater than 5 milliseconds…

image

…a warning if the response time is equal to 5 milliseconds…

image

…and finally if the response is less than 5 milliseconds it will be healthy state…
image

…next…

image

…and now we want to create an alert, matching the monitor state…

image

After a while checking the Linux computer Health Explorer, we see it works like a charm…

image

…and an alert was created…

image

In addition you could also create a diagnostic task using a shell script…

image

…or recovery task using a shell script for this monitor…

image

UNIX/Linux Script Performance Rule

I selected a UNIX/Linux Script Performance rule, target Linux Computer class and leave it enabled…

image

…run every minute :), just for demo…

image

…paste the very same script as I used for the monitor in the first example, make sure the account has enough permission…

image

…the script looks like this…

#!/usr/bin/perl
use strict;
use warnings;
use 5.010;
use Net::Ping;

my $p=Net::Ping->new();
my ($ret,$duration,$ip)=$p->ping(‘DC01.masta.ad’);
say int($duration);

Next, we need to define the filter expression, which looks like this…image

…the performance mapper, we just add some information for demo purposes. For this example it doesn’t matter at all…

image

After a couple of minutes, we can see the performance result in the performance view…

image

Having these options in place we are able to migrate any existing  UNIX/Linux monitoring tool to SCOM 2016! You simply need copy/paste the scripts from e.g. Nagios into the SCOM monitors and rules.

2 Comments

  1. Pingback: Whats New In SCOM 2016 | Tech Guide

  2. I have some weird behavior with this.  We’re trying to alert on open file counts. We’re using this command:

    lsof -u tomcat | tail -n +2 | wc -l

    The output seems to come back fine:

    ReturnValue
    TRUE

    ReturnCode
    0

    StdOut
    335

    We want Errors at 14000 and Warnings at 12000. The alert seems to be triggered regardless of the value.

    These are my threshold settings:

    Error
    //*[local-name()=”ReturnCode”] Does not equal 0
    Or
    //*[local-name()=”StdOut”] Greather than or equal to 14000

    Warning
    //*[local-name()=”ReturnCode”] Equals 0
    And
    //*[local-name()=”StdOut”] Greater than or equal to 12000
    And
    //*[local-name()=”StdOut”] Less than 14000

    Healthy
    //*[local-name()=”ReturnCode”] Equals 0
    And
    //*[local-name()=”StdOut”] Less than 12000

    Any ideas what I might be doing wrong?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s