SCOM 2012 – Monitor SMA Runbook Instance (VSAE Sample MP)

When you are starting to automate processes in your company you soon will find out, that not all runbooks will run for a short time. There are also cases where you need a runbook, which is using a loop to monitor certain things or trigger other runbooks. In other words, there are cases where you need runbooks which run constantly. It is NOT recommended to have too many such long running runbooks, but if you need to trigger another runbook every 15 minutes there is no way currently in SMA to use the built-in schedule. The limitation is, that you cannot have a lower interval than 1 day (recurring). Although the SMA runbook infrastructure is highly available it could happen that your long running runbooks will stop for many reasons (reboots, shutdowns, bad handling etc.).

A simple version of such a trigger runbook could be this…

workflow Demo-Monitor {
 param (
 [parameter(Mandatory=$true)]
 [int]$Interval,
 [parameter(Mandatory=$true)]
 [string]$RunbookXY
 )

$Endpoint = Get-AutomationVariable -Name ‘SMA-WebServiceEndpoint’
while($true)
     {
     Start-Sleep -s $Interval
     #Start other runbook e.g. Start-SmaRunbook -Name $RunbookXY -WebServiceEndpoint $Endpoint
     }
 }

Most of times these runbooks are important and if you don’t get some sort of notification, you will not find out that the runbooks are not running anymore. One approach, which I think makes sense is to monitor these runbook instances by running a SQL query against the SMA database and check if there is one or more running instances of the runbook. If there is no instance we should receive an alert in SCOM.

For that reason I created a sample management pack, containing an alert rule which executes a PowerShell script. The SQL query looks like this…

SELECT  COUNT(*) as RunningJobCount
FROM    Core.Jobs INNER JOIN
        Core.vwJobs ON Core.vwJobs.JobId = Core.Jobs.JobId INNER JOIN
        Core.vwRunbooks ON Core.vwJobs.RunbookVersionId = Core.vwRunbooks.PublishedRunbookVersionId
        OR Core.vwJobs.RunbookVersionId = Core.vwRunbooks.DraftRunbookVersionID
WHERE JobStatus = ‘Running’ AND RunbookName = ‘$Runbook’

…this delivers the current SMA job instance count of a runbook.

The rest of the script creates a SQL connection to the SMA database / server and depending on the result returns a property bag containing “Error” or “OK”. The script looks like this…

param([string]$SQLInstance,[string]$Database,[string]$Runbook)
#Initialize Property Bag
$API = New-Object -ComObject "MOM.ScriptAPI"
$Bag = $API.CreatePropertyBag()
#SQL Query to check runbook instance count and state
$SQLQuery="
SELECT  COUNT(*) as RunningJobCount
FROM    Core.Jobs INNER JOIN
        Core.vwJobs ON Core.vwJobs.JobId = Core.Jobs.JobId INNER JOIN
        Core.vwRunbooks ON Core.vwJobs.RunbookVersionId = Core.vwRunbooks.PublishedRunbookVersionId 
        OR Core.vwJobs.RunbookVersionId = Core.vwRunbooks.DraftRunbookVersionID
WHERE JobStatus = 'Running' AND RunbookName = '$Runbook'
" 
#Script start logging
$API.LogScriptEvent("CheckRunbook.ps1", 10,0,"Preparing query against $SQLInstance on Database $Database with query: $SQLQuery .")
#Setup ADO connection & recordset
$ADOCon = New-Object -ComObject "ADODB.Connection"
$Results = New-Object -ComObject "ADODB.Recordset"
$OpenStatic = 3
$LockOptimistic = 3
#Setup provider & timeout
$ADOCon.Provider = "SQLOLEDB"
$ADOCon.ConnectionTimeout = 60
$ConnectionString = "Server=$SQLInstance;Database=$Database;Integrated Security=SSPI"
try     
    { 
    $ADOCon.Open($ConnectionString)
    }
catch 
    { 
    #Log error if connection cannot established
    $API.LogScriptEvent("CheckRunbook.ps1", 11,1,"Error connecting. ConnectionString: $ConnectionString Error: $Error[0]")
    }
if ($ADOCon.State -ne 0)
{
        try {     
            #Open SQL connection
            $Results.Open($SQLQuery, $ADOCon, $OpenStatic, $LockOptimistic)
            #Log event if connection is successful
            $API.LogScriptEvent("CheckRunbook.ps1", 20,0,"Successfully executed query against $SQLInstance on Database $Database. Value: $($Results.Fields.Item("RunningJobCount").Value)")
            #Check instance count, if no instance is running, return Error else OK
            If ($Results.Fields.Item("RunningJobCount").Value -eq 0)
            {
                $Bag.AddValue('State', "Error")
                $Bag.AddValue('Runbook', $Runbook)
                $Bag.AddValue('Database', $Database)
                $Bag.AddValue('SQLInstance', $SQLInstance)
            } 
            else 
            {
                $Bag.AddValue('State', "OK")
            }
            $Bag
        } 
        catch 
        { 
            #Log error if the query cannot executed        
            $API.LogScriptEvent("CheckRunbook.ps1", 21,1,"Error executing query against $SQLInstance on Database $Database with query $SQLQuery $Error[0]")
        }
    #Close all connection
    $Results.Close()
    $ADOCon.Close()
}

The script will log events in the OperationsManager event log, if there is a connection or query issue.

The rule itself is targeted in this example at the SCOM RMSE server and this agent will also execute the script. For that reason, make sure you give your SCOM management server action account proper permission on the SMA DB.

You can override the rule with your runbook name / database / server as you can see in this screenshot…

image

If you configured everything the way you want the alert looks like this…

image

..and some information in “Alert Context”…

image

…and “Alert Description”…

image

This management pack should not provide a full featured solution, instead it should give you a good starting point to customize the rule for your needs accordingly. The target and interval are just for demo purposes!

The VSAE project is available on TechNet Gallery.

I hope this helps!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s