SCOM 2012 – Linux Discovery “Unspecified failure”

Today I had a very interesting error for troubleshooting. A customer has about 150 Linux servers which he wants to monitor using SCOM 2012. The customer could deploy all Linux agents except for 4 Linux servers. After each attempt to discover one of these troublesome servers the discovery wizard ended shortly after the starting with a warning.

linuxerror

According to the Linux guy all server are the same version of Oracle Linux which is exactly the same as the corresponding Red Hat Linux release except for the logo.

1) I verified that port TCP 22 (SSH) and TCP 1270 are open to the Linux server and that the Linux releases are the same as the other >146 Linux servers which were discovered without any issues.

2) Next I created on each management server a file called “EnableOpsMgrModuleLogging” in the “c:\Windows\TEMP” directory executing the command:

COPY /Y NUL %windir%\TEMP\EnableOpsMgrModuleLogging

This will enable debug logging especially if you have problems running the Linux discovery wizard. After the file had been created I restarted on each management server the “HealthService” service to make sure the configuration will be active.

3) At this point I ran the discovery wizard again and immediately several files were created in the “c:\Windows\TEMP” directory. It is possible that the files are not created on the same server where you run the discovery wizard so check each management server’s “c:\Windows\TEMP” directory on which you enabled the debug logging.

image

4) I checked each of the debug files for some errors or inconsistencies and I found in “SSHCommandProbe.log” an interesting output. As you can see a shell script called “GetOSVersion.sh” is executed on the Linux machine to identify the proper Linux version. In my case the output was “Unknown” and therefore the discovery wizard can not determine what release it would be.

image

5)  Next I inspected the “GetOSVersion.sh” script which you can find in the following directory on your management server.

image

As you can see it will check the “/etc/redhat-release” file for “Red Hat Enterprise Linux” string to determine the operating system name and version.

image

6) Now I compared the content of the “/etc/redhat-release” file on the troublesome servers against the already discovered Linux servers. I found the following string:

Enterprise Linux Enterprise Linux Server release 5.2 (Carthage)

instead of something like

Red Hat Enterprise Linux Server release 5.2 (Tikanga)

After changing to the appropriate release string the discovery worked like a charm. Cool!

Enabling this debugging feature is very useful if you don’t get a meaningful discovery error. You might want to check the TechNet article about logging and debugging.

Cheers,

Stefan

2 Comments

  1. I am getting Discover was unsuccessful error (SCOM 2012 SP1 , Red Hat linux Agent)

    Unexpected DiscoveryResult.ErrorData type. Please file bug report.
    ErrorData: Microsoft.SystemCenter.CrossPlatform.ClientLibrary.MPAbstractions.WSManUnknownErrorException
    The SSL connection cannot be established. Verify that the service on the remote host is properly configured to listen for HTTPS requests. Consult the logs and documentation for the WS-Management service running on the destination, most commonly IIS or WinRM. If the destination is the WinRM service, run the following command on the destination to analyze and configure the WinRM service: “winrm quickconfig -transport:https”.
    at System.Activities.WorkflowApplication.Invoke(Activity activity, IDictionary`2 inputs, WorkflowInstanceExtensionManager extensions, TimeSpan timeout)
    at System.Activities.WorkflowInvoker.Invoke(Activity workflow, IDictionary`2 inputs, TimeSpan timeout, WorkflowInstanceExtensionManager extensions)
    at Microsoft.SystemCenter.CrossPlatform.ClientActions.DefaultDiscovery.InvokeWorkflow(IManagedObject managementActionPoint, DiscoveryTargetEndpoint criteria, IInstallableAgents installableAgents)

    ****************************************

    I enabled the logging as described:
    Following entries show in the Log:

    SCXNameResolverProbe.log
    0: 01/24/13 14:54:44 : Enter SCXNameResolverProbe constructor
    0: 01/24/13 14:54:44 : Leave SCXNameResolverProbe constructor
    1: 01/24/13 14:54:44 : Enter SCXNameResolverProbe::DoInit
    1: 01/24/13 14:54:44 : XML_INIT_CALL
    1: 01/24/13 14:54:44 : Check WSAStartup
    1: 01/24/13 14:54:44 : Exit SCXNameResolverProbe::DoInit
    2: 01/24/13 14:54:44 : Enter SCXNameResolverProbe::DoProcess()
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess passed initial arguments checking
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess input: Linux01
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess – Input is a host name
    3: 01/24/13 14:54:44 : Enter SCXNameResolverProbe::GetIP(): Linux01
    3: 01/24/13 14:54:44 : SCXNameResolverProbe::GetIP() – IPv4 address family.
    3: 01/24/13 14:54:44 : SCXNameResolverProbe::GetIP() – Found a good IP: 10.x.x.x
    3: 01/24/13 14:54:44 : Leave SCXNameResolverProbe::GetIP(): 10.x.x.x
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess – Perfom the reverse lookup
    4: 01/24/13 14:54:44 : Enter SCXNameResolverProbe::GetName(): 10.x.x.x
    4: 01/24/13 14:54:44 : Resolve IPv4 address to host name
    4: 01/24/13 14:54:44 : Leave SCXNameResolverProbe::GetName(): Linux01.domain.com
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess returns name: Linux01.domain.com
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess returns ip: 10.x.x.x
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess returns errorcode: 0
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess returns errortext:
    2: 01/24/13 14:54:44 : SCXNameResolverProbe::DoProcess returns xml: Linux01.domain.com10.x.x.x0
    2: 01/24/13 14:54:44 : Enter initDataHolder
    2: 01/24/13 14:54:44 : Enter initDataType
    2: 01/24/13 14:54:44 : initDataType initializing output datatype
    2: 01/24/13 14:54:44 : Leave initDataType
    2: 01/24/13 14:54:44 : Leave initDataHolder
    2: 01/24/13 14:54:44 : Leave SCXNameResolverProbe::DoProcess
    5: 01/24/13 14:54:44 : Enter SCXNameResolverProbe destructor
    5: 01/24/13 14:54:44 : SCXNameResolverProbe destructor – FreeLibrary failed: 0
    5: 01/24/13 14:54:44 : Leave SCXNameResolverProbe destructor

    SCXWSManProbeAction.log
    0: 01/24/13 14:54:44 : Enter SCXWSManProbeAction::DoInit
    0: 01/24/13 14:54:44 : XML_INIT_CALL
    0: 01/24/13 14:54:44 : Exit SCXWSManProbeAction::DoInit
    1: 01/24/13 14:54:44 : Enter SCXWSManProbeAction::DoProcess
    1: 01/24/13 14:54:44 : passed initial arguments validation
    1: 01/24/13 14:54:44 : validate credentials
    1: 01/24/13 14:54:44 : username: scomuser
    1: 01/24/13 14:54:44 : ** username: scomuser
    1: 01/24/13 14:54:44 : elevType: 0
    1: 01/24/13 14:54:44 : credentials appear valid
    1: 01/24/13 14:54:44 : inputString: “”
    1: 01/24/13 14:54:44 : ** inputString: “”
    1: 01/24/13 14:54:44 : Enter initDataHolder
    1: 01/24/13 14:54:44 : Enter initDataType
    1: 01/24/13 14:54:44 : initDataType initializing output datatype
    1: 01/24/13 14:54:44 : Leave initDataType
    1: 01/24/13 14:54:44 : Leave initDataHolder
    1: 01/24/13 14:54:44 : Exit SCXWSManProbeAction::DoProcess

    After that the discovery fails . i do not see any other Logs on any of the management servers.

    i was following your blog for Linux agent monitoring

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s