on 04-08-2016 7:43 PM
Hi All,
I received this alert and can't find anything in the logs that would back this alert up. I know this is a canned watch. The only thing that comes close is that it seems the CMS auto restarted for some reason. As the Date Modified shows that in the CMC. If the CMS restarts then everything on the server would also restart. The other servers have older Date Modified entries.
What would cause this to happen?
Subject: ALL_SERVICES_ALERTS Danger Event
Danger Rule evaluated to true for "ALL_SERVICES_ALERTS" watch.
Danger Rule: BOProdCluster.APS.Visualization$'Health State'==0 || BOProdCluster.APS.Visualization$'Health State'==5 || BOProdCluster.APS.Analysis$'Health State'==0 || BOProdCluster.APS.Analysis$'Health State'==5 || BOProdCluster.APS.Auditing$'Health State'==0 || BOProdCluster.APS.Auditing$'Health State'==5 || BOProdCluster.APS.Connectivity$'Health State'==0 || BOProdCluster.APS.Core$'Health State'==0 || BOProdCluster.APS.Core$'Health State'==0 || BOProdCluster.APS.DF$'Health State'==0 || BOProdCluster.APS.LCM$'Health State'==0 || BOProdCluster.APS.Monitoring$'Health State'==0 || BOProdCluster.APS.Search$'Health State'==0 || BOProdCluster.APS.WEBI$'Health State'==0 || BOProdCluster.APS.WEBIDSLBridge$'Health State'==0 || BOProdCluster.APS.WEBIDSLBridge1$'Health State'==0 || BOProdCluster.AdaptiveJobServer$'Health State'==0 || BOProdCluster.CentralManagementServer$'Health State'==0 || BOProdCluster.ConnectionServer$'Health State'==0 || BOProdCluster.ConnectionServer32$'Health State'==0 || BOProdCluster.ConnectionServer32$'Health State'==0 || BOProdCluster.InputFileRepository$'Health State'==0 || BOProdCluster.OutputFileRepository$'Health State'==0 || BOProdCluster.WebApplicationContainerServer$'Health State'==0 || BOProdCluster.WebIntelligenceProcessingServer$'Health State'==0 || BOProdCluster.WebIntelligenceProcessingServer1$'Health State'==0 || BOProdCluster.WebIntelligenceProcessingServer2$'Health State'==0 || BOProdCluster.WebIntelligenceProcessingServer3$'Health State'==0 || Cluster58.APS.Analysis$'Health State'==0 || Cluster58.APS.Auditing$'Health State'==0 || Cluster58.APS.Connectivity$'Health State'==0 || Cluster58.APS.Core$'Health State'==0 || Cluster58.APS.DF$'Health State'==0 || Cluster58.APS.LCM$'Health State'==0 || Cluster58.APS.Search$'Health State'==0 || Cluster58.APS.Visualization$'Health State'==0 || Cluster58.APS.WEBI$'Health State'==0 || Cluster58.APS.WEBIDSLBridge$'Health State'==0 || Cluster58.APS.WEBIDSLBridge1$'Health State'==0 || Cluster58.AdaptiveJobServer$'Health State'==0 || Cluster58.CentralManagementServer$'Health State'==0 || Cluster58.ConnectionServer$'Health State'==0 || Cluster58.ConnectionServer32$'Health State'==0 || Cluster58.DashboardsCacheServer$'Health State'==0 || Cluster58.DashboardsProcessingServer$'Health State'==0 || Cluster58.EventServer$'Health State'==0 || Cluster58.InputFileRepository$'Health State'==0 || Cluster58.OutputFileRepository$'Health State'==0 || Cluster58.WebApplicationContainerServer$'Health State'==0 || Cluster58.WebIntelligenceProcessingServer$'Health State'==0 || Cluster58.WebIntelligenceProcessingServer1$'Health State'==0 || Cluster58.WebIntelligenceProcessingServer2$'Health State'==0 || Cluster58.WebIntelligenceProcessingServer3$'Health State'==0 || BOProdCluster.APS.Connectivity$'Health State'==0 || BOProdCluster.APS.Core$'Health State'==0 || BOProdCluster.APS.Core$'Health State'==0 || BOProdCluster.APS.DF$'Health State'==0 || BOProdCluster.APS.LCM$'Health State'==0 || BOProdCluster.APS.Monitoring$'Health State'==0 || BOProdCluster.APS.Search$'Health State'==0 || BOProdCluster.APS.WEBI$'Health State'==0 || BOProdCluster.APS.WEBIDSLBridge$'Health State'==0 || BOProdCluster.APS.WEBIDSLBridge1$'Health State'==0 || BOProdCluster.AdaptiveJobServer$'Health State'==0 || BOProdCluster.CentralManagementServer$'Health State'==0 || BOProdCluster.ConnectionServer$'Health State'==0 || BOProdCluster.ConnectionServer32$'Health State'==0 || BOProdCluster.ConnectionServer32$'Health State'==0 || BOProdCluster.InputFileRepository$'Health State'==0 || BOProdCluster.OutputFileRepository$'Health State'==0 || BOProdCluster.WebApplicationContainerServer$'Health State'==0 || BOProdCluster.WebIntelligenceProcessingServer$'Health State'==0 || BOProdCluster.WebIntelligenceProcessingServer1$'Health State'==0 || BOProdCluster.WebIntelligenceProcessingServer2$'Health State'==0 || BOProdCluster.WebIntelligenceProcessingServer3$'Health State'==0 || Cluster58.APS.Analysis$'Health State'==0 || Cluster58.APS.Auditing$'Health State'==0 || Cluster58.APS.Connectivity$'Health State'==0 || Cluster58.APS.Core$'Health State'==0 || Cluster58.APS.DF$'Health State'==0 || Cluster58.APS.LCM$'Health State'==0 || Cluster58.APS.Search$'Health State'==0 || Cluster58.APS.Visualization$'Health State'==0 || Cluster58.APS.WEBI$'Health State'==0 || Cluster58.APS.WEBIDSLBridge$'Health State'==0 || Cluster58.APS.WEBIDSLBridge1$'Health State'==0 || Cluster58.AdaptiveJobServer$'Health State'==0 || Cluster58.CentralManagementServer$'Health State'==0 || Cluster58.ConnectionServer$'Health State'==0 || Cluster58.ConnectionServer32$'Health State'==0 || Cluster58.DashboardsCacheServer$'Health State'==0 || Cluster58.DashboardsProcessingServer$'Health State'==0 || Cluster58.EventServer$'Health State'==0 || Cluster58.InputFileRepository$'Health State'==0 || Cluster58.OutputFileRepository$'Health State'==0 || Cluster58.WebApplicationContainerServer$'Health State'==0 || Cluster58.WebIntelligenceProcessingServer$'Health State'==0 || Cluster58.WebIntelligenceProcessingServer1$'Health State'==0 || Cluster58.WebIntelligenceProcessingServer2$'Health State'==0 || Cluster58.WebIntelligenceProcessingServer3$'Health State'==0
The metrics that have crossed their respective thresholds:
BOProdCluster.CentralManagementServer$'Health State'
BOProdCluster.CentralManagementServer$'Health State'
Appreciate any suggestions.
BW
Hey Bill,
If the CMS restarted then it would trigger this alert since BOProdCluster.CentralManagementServer$'Health State' watch would be triggered if the server is stopped.
One thing you can do is edit the watch and change the threshold to only trigger the watch if it has been in danger state for > 10 minutes for example. This way, if the server is simply restarted the watch won't give an alert.
Cheers
Toby
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hey Bill,
If the CMS is restarted, then on the server where the CMS is running, in the Application event log there will be a warning entry from the source Server Intelligence Agent that says something like:
[Node Name: BI42LCM2]
[User Name: BI42LCM2-0$]
Server Intelligence Agent is requesting server BI42LCM2.CentralManagementServer to terminate.
You could also check under CMC->Monitoring->Metrics -- >'Server Running State' metric then view history and change the date range to see when the server was stopped over the past days/weeks/months etc
Regards
Toby
User | Count |
---|---|
95 | |
11 | |
10 | |
9 | |
9 | |
7 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.