Configuration Performance SMA System Center Troubleshooting

SMA – Database Grooming Some Things You Should Know

know

SMA is Microsoft’s on-premise automation engine and the successor of Opalis / Orchestrator. We have utilized this engine quite a lot and have lots of experience developing PowerShell workflows for SMA. But as every system your need to maintain and pamper it, otherwise it will strike back at some point. We recently experienced such an issue, which also could happen in your environment.

When a runbook is executed it generates a job, see more details here. A job can have different status either be failed, stopped, suspended or running. So, if you decide you want to debug a runbook because it fails all the time, you can turn on different log levels or also known as runbook streams. There is an excellent post on System Center: Orchestrator Engineering Blog explaining how you turn on one of the six different streams like Output, Progress, Warning, Error, Verbose, and Debug. Depending on the type you receive different information levels.

What happens is, as soon you turn on e.g. verbose stream you will see it in the job output like this…

image

A best practice is to keep these streams turned off and only enabling it if you really need them. But why is that? Well, this output has to stay “somewhere” otherwise it would not be “persistent”. In SMA this output gets stored in the Stream.JobStreams table. If you run a select query against this table you will see something like this…

image

If you have a closer look at the Stream TypeName column you figure out the stream type like Verbose, Output, Progress etc. If you see Output, this does not mean it is only data from Write-Output, instead it is also data returned by a runbook for passing as input for the next runbook. As a side note, you should never use Write-Output in your runbooks instead use Write-Verbose. Write-Output is only meant for output objects and consuming by other runbooks.

Let’s assume you did leave this the switch LOG VERBOSE RECORD accidentally turned on, set to TRUE…

image

What happen is, that the runbook will log verbose data into the SMA Stream.JobStreams database table and the Stream.JobStreams table will grow quickly. If you want to figure out which runbook as e.g. verbose logging activated use this query…

SELECT
      [RunbookName]
      ,[LogDebug]
      ,[LogVerbose]
      ,[LogProgress]
  FROM [SMA].[Core].[Runbooks]
  WHERE LogVerbose = 1 

What happen if you don’t check these settings you could run into some trouble. I show you, what I mean.

From time to time it makes sense to run the SQL default report like Disk Usage by Top Tables which outputs the largest tables in your database…

image

This will show the largest tables like in this example…

image

As you can see the Stream. JobStreams table is the largest table in SMA.

This leads us to the question, isn’t there some kind of grooming that would take care of this? The answer is yes, there is according to TechNet it says:

  • By default, the database purge job runs every 15 minutes, and it runs only if there are records to purge.
  • Records are purged only if they are older than the default duration of 30 days. This time is configurable by using the Set-SmaAdminConfiguration cmdlet and setting the –PurgeJobsOlderThanCountDays parameter.
  • If the total job record count exceeds the MaxJobRecords parameter set by the same Set-SmaAdminConfiguration cmdlet, then more job records will be purged. The default value for this parameter is 120,000 records.

In order to check these settings we can run Get-SmaAdminConfiguration, which shows in our picture the default settings…

12

But how does grooming work in more detail? When you install SMA, a SQL job is created called SMA Database Purge Job

image

This job runs every 15 minutes and executes a stored procedure called Purge.PurgeNextBatch. This stored procedure triggers a bunch of other stored procedures to groom data to keep the database small and in consistent shape. But now let’s have a look at the Purge.PurgeNextBatch stored procedure to understand WHEN it cleans records out of the SMA database.

The grooming process will first delete all records that are older than 30 days and this will be done in batches of 1000 records. If there are no records, that are older than 30 days (based on the LastModifiedTime time stamp in the Core.Jobs table) but if the row count of the Core.Jobs table is higher, than the defined MaxJobRecords like 120’000 (default) it will also start grooming out these records. If the row count e.g. is 120’900 of the Core.Jobs table the batch size will not be 1000 instead it gets reassigned to 900 and these records will be deleted.

As I mentioned Purge.PurgeNextBatch stored procedure is just the main trigger for other stored procedures seen here like Purge.xxxxxx

14

After the Purge.PurgeNextBatch has determined which records will be deleted, it passes the jobs to the Purge.PurgeJobs, which will take care of these table Stream.JobStreams, Stats.JobStatusLog, Stats.JobSummary, WorkflowState.BinaryData, WorkflowState.TextData, Core.JobExceptions, Core.JobPendingAction, Core.JobStreamStatus, Core.Jobs. As you can see the our large Stream.JobStreams table gets purged every time the job runs.

Well this sounds great, but there is an issue if you are using the SMA database in a SQL Always-On cluster. When you are building your SQL Always-On cluster, you need to be aware of certain things, as mentioned in this article https://msdn.microsoft.com/en-us/library/hh270282.aspx .

Logins and jobs are not the only information that need to be recreated on each of the server instances that hosts an secondary replica for a given availability group. For example, you might need to recreate server configuration settings, credentials, encrypted data, permissions, replication settings, service broker applications, triggers (at server level), and so forth.

Before we have seen, that the grooming job is created when you install SMA. If you run the SMA database on SQL Always-On cluster, it could be, that the job exists on node A (secondary, read-only) and the SMA database will be active on node B (primary, read/write). This means the grooming job will never be able to succeed and in the job history you will see entries like this…

image

You can solve this issue by creating the grooming job on the node which has not the grooming job configured. Select the job Script Job as > CREATE To > New Query Editor Window , this will dump the job into the query window and after that, run this query on the other node of the cluster.

image

Finally, you have on both nodes a grooming job. But, there is another problem, the job will run on both nodes and it will fail on the node, which is currently the secondary node (read-only), because the job cannot modify the database. Therefore, we need to check if the job runs on the primary node or not, and if so the job can start otherwise it should exit. Luckily on Stackexchange there is a snipped about this kind of logic, so we need just to implement this into the job. How to do that?

Open the job properties…

image

Click edit and add these lines of code…

DECLARE @ServerName NVARCHAR(256)  = @@SERVERNAME 
DECLARE @RoleDesc NVARCHAR(60)

SELECT @RoleDesc = a.role_desc
    FROM sys.dm_hadr_availability_replica_states AS a
    JOIN sys.availability_replicas AS b
        ON b.replica_id = a.replica_id
WHERE b.replica_server_name = @ServerName


IF @RoleDesc = 'PRIMARY'
BEGIN
	exec SMA.Purge.PurgeNextBatch    
END 

ELSE
	PRINT 'SMA Purge Job skipped - ' + @@SERVERNAME + ' is ' + @RoleDesc

Like this…

image

That’s it! So every time (every 15 minutes) the job runs, it will check if it is running on the primary node and if so it will execute and purge your tables.

I want to thank my buddy Fulvio Ferrarini for helping and working on this issue and for his awesome SQL knowledge, input and ideas!

I hope this helps you keeping the SMA database in good shape!

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.