cancel
Showing results for 
Search instead for 
Did you mean: 

CMS crash with core files and multiple report output generation

Former Member
0 Kudos

Happy new year to everyone,

Our BOXIR3.1SP6FP2 env has recently started behaving weirdly by triggering multiple output to users inbox and email notification out of scheduled reports. Also we have noticed the CMS crash with core file (almost 4GB) generation at the time of multiple report output.

Most of the times, CMC crashes and recycles itself. At few times, CMS services alone went shut down.

OS details: RHEL 5.5, 32 GB RAM, 8 core processor on each of the clustered node, Oracle 10GR2.4 CMS DB server, 11GR2.4 oracle reporting DB server and oracle 11.1.0.6 client.

2015/01/21 23:54:37.946|>=| | |28123|1534131088|{|||||||||||||||DBQueue::Read

2015/01/21 23:54:37.946|==| | |28123|1496185744|
|||||||||||||||(OracleStatement.cpp:156) Prepare: SQL: SELECT ObjectID,
Version, LastModifyTime, CRC, Properties FROM CMS_InfoObjects6 WHERE ObjectID
IN (1004050) ORDER BY ObjectID

2015/01/21 23:54:37.946|==| | |28123|1496185744| ||||||||||||||(OracleStatement.cpp:183) Prepared statement Execute

2015/01/21 23:54:37.965|==| | |28123|1496451984| |||||||||||||||SResourceSource::LoadString 50293

2015/01/21 23:54:37.966|==| | |28123|1496451984| |||||||||||||||SResourceSource::LoadString Unknown exception in database thread

2015/01/21 23:54:37.967|==| | |28123|1496451984| |||||||||||||||SResourceSource::LoadString 33007

2015/01/21 23:54:37.967|==| | |28123|1496451984| |||||||||||||||SResourceSource::LoadString CMS is unstable and will shut down immediately. Reason: %1...

2015/01/21 23:54:38.506|==| | |28123|1496185744| |||||||||||||||(OracleStatement.cpp:156) Prepare: SQL: SELECT ObjectID,
Version, LastModifyTime, CRC, Properties FROM CMS_InfoObjects6 WHERE ObjectID IN (1009213) ORDER BY ObjectID

2015/01/21 23:54:38.506|==| | |28123|1496185744| |||||||||||||||(OracleStatement.cpp:183) Prepared statement Execute

2015/01/21 23:54:38.512|==| | |28123|1455592672| |||||||||||||||(sidaemon.cpp:549) SUNIXDaemon::run: server restart flag is 1..

2015/01/21 23:54:38.513|==| | |28123|1455592672| |||||||||||||||(sidaemon.cpp:552) SUNIXDaemon::run: in abort ...

2015/01/21 23:54:38.513|==| | |28123|1455592672| |||||||||||||||(sidaemon.cpp:555) SUNIXDaemon::run: doing the WithAbort case ...

2015/01/21 23:54:38.520|==| | |28123|1496185744| |||||||||||||||(dbq.cpp:1357) DBQ: Time required to read 1 objects: 20.000000 ms

Thank you,

Karthik

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi All,

The issue got resolved few months ago and thought of updating the resolution here.

Cause: CMS crashed while reading a large or inflated string value for metadata property SI_WEBIDOCPROPERTIES saved with the physical report. As per SAP support engineer, this might have grown abnormally through the use of incompatible webi rich client version.

Resolution:

A. Appended the parameter "-maxobjectsincache 1000" at CMC-->Servers-->CMS command line section

B. Reduced the "SizeOfTheLargestObjectAllowedInTheCacheInBytes"="10000" at .registry file present at ../../bobje/data/.bobj/registry/software/business objects/suite 12.0/cms/instances/CMS_name.cms/

C. Also SAP provided a SDK LA fix code to clean or remove the large string value for the property SI_WEBIDOCPROPERTIES for all webi documents present in the system.

Preventive measure: As per SAP's recommendation, better to use the compatible BO client tool same as the server version.

Thanks,

Karthik

Answers (2)

Answers (2)

0 Kudos

Hi Kartik,

Could you confirm if all 14 or 15 connections are established with the DB

- You could get this as an info in the system logs, /var/logs/messages

- Also one can check the active connections using

netstat -tulpn | grep 6400 >> /home/<username>/out.txt

or

netstat -an | grep 6400 >> /home/<username>/out.txt

Additionally check if there are any audit backlogs in the /auditing directory

-Guru

Former Member
0 Kudos

Hi Guru,

CMS system logs (/var/log/messages) shows 14 connections established eventually however with intermediate TNS error, partial connectivity issue, system out of memory error.

Jan 28 19:51:31 Linuxservername boe_cmsd[571]: CMS is unstable and will shut down immediately. Reason: BusinessObjects Enterprise CMS: Unable to connect to the CMS system database ""UKOLTU55"". Reason: ORA-12541: TNS:no listener

Jan 28 19:51:31 Linuxservername boe_cmsd[571]: The root server reported an error Initialization Failure. (Reason: BusinessObjects Enterprise CMS: Unable to connect to the CMS system database ""UKOLTU55"". Reason: ORA-12541: TNS:no listener BusinessObjects Enterprise CMS: Unable to connect to the CMS system database ""UKOLTU55"". Reason: ORA-12541: TNS:no listener  CDatabase::Open failure. ).

Jan 28 19:51:31 Linuxservername boe_cmsd[571]: Central Management Server stopped

Jan 28 19:51:31 Linuxservername boe_cmsd[575]: Central Management Server started

Jan 28 19:51:48 Linuxservername boe_cmsd[575]: BusinessObjects Enterprise CMS: Partially connected to CMS system database ""UKOLTU55"".  14 CMS system database connections were requested, but only 6 connections could be established. Reason: ORA-12541: TNS:no listener

...

Jan 28 19:52:48 Linuxservername boe_cmsd[575]: BusinessObjects Enterprise CMS: Partially connected to CMS system database ""UKOLTU55"".  14 CMS system database connections were requested, but only 10 connections could be established. Reason: ORA-12541: TNS:no listener

Jan 28 19:53:48 Linuxservername boe_cmsd[575]: BusinessObjects Enterprise CMS: Successfully established all 14 connections to CMS system database ""UKOLTU55"".

....

Jan 28 20:47:39 Linuxservername boe_cmsd[575]: CMS is unstable and will shut down immediately. Reason: System is out of memory.

Thanks,

Karthik

denis_konovalov
Active Contributor
0 Kudos

well, there you go - these system messages explain the problem - intermittent Oracle connectivity issues and CMS going out of memory.

fix Oracle issues and try adding another CMS sever

former_member185603
Active Contributor
0 Kudos

Is it VM server? if it is VM server, make sure server has enough resources available all the time.

Former Member
0 Kudos

Hi Jawahar,

It's a physical servers. Standalone QA env is working fine however we are seeing this issue in clustered nodes of UAT and prod.

Could you please give some pointers to verify the resources.

Thank for your response,

Karthik

0 Kudos

Hello,

looks like your report bursting is creating too much load. Check the following note:

http://service.sap.com/sap/support/notes/1463190

Also consider setting up the ulimit value on your Hosts.

Regards

-Seb.

denis_konovalov
Active Contributor
0 Kudos

Agree with Seb here.

Either ulimits are not up to the task or your CMS DB can't keep up.

Former Member
0 Kudos

Thanks Sebastian,

As per SAP L4 recommendation, I've set the size of the largest object in cache to 10000 however still I've observed the CMS crash.

Best regards,

Karthik

Former Member
0 Kudos

Thanks Denis,

Even we have tried increasing the ulimit open file from 1024 to 200000 however it seems to have back fired with more core files. Could you please share your thoughts to keep the CMS up.

Thanks,

Karthik

denis_konovalov
Active Contributor
0 Kudos

Can you show the result of ulimit -a command here ?

You need to troubleshoot this issue, instead of looking for one answer that fixes everything.
For example - what is the load threshold that triggers CMS server crash ?
What happens with CMS DB at that time ? (is your CMS DB on the same box as BOE or different)
What are the resource usage metrics at the time of failure?
Have you analyzed the core files themselves ?

Google

Former Member
0 Kudos

Hi Denis,

I'm trying my best for the last few weeks to understand the core issue along with SAP however it is still a mystery.

>Ulimit -a

core file size          (blocks, -c) 0

data seg size           (kbytes, -d) unlimited

scheduling priority             (-e) 0

file size               (blocks, -f) unlimited

pending signals                 (-i) 270335

max locked memory       (kbytes, -l) 32

max memory size         (kbytes, -m) unlimited

open files                      (-n) 1024

pipe size            (512 bytes, -p) 8

POSIX message queues     (bytes, -q) 819200

real-time priority              (-r) 0

stack size              (kbytes, -s) 10240

cpu time               (seconds, -t) unlimited

max user processes              (-u) 270335

virtual memory          (kbytes, -v) unlimited

file locks                      (-x) unlimited

Below is the observation as part of troubleshooting:

1. CMS breaks at threshold of 3.9 G.

2. CMS DB sits in a different Linux server than BOE server.

3. All core files were generated by boe_cmsd process and are almost 4GB in size (same as max threshold which it breaks).

4. Shell script which I've added in the BOE servers shows that the CMS DB is available/connecting at the time of CMS crash.

5. SAP analysed the Core files and skeptical about the below lines.

     #3  0x58687b80 in skgesigCrash ()

      from /opt/oracle/product/11.1.0/client_1/lib32/libclntsh.so

     #4  0x58687e0d in skgesig_sigactionHandler ()

I'll continue troubleshooting with a hope to fix it at the earliest.

Thanks,

Karthik

denis_konovalov
Active Contributor
0 Kudos

well, open files ulimit is still 1024, which is well below what's needed.

By thresholds I meant report load in your scheduling. Does it crashes when you have 10 report jobs, or 100 or in between ?
XI3.1 is 32 bit architecture and CMS is 32 bit process, it can't reach 3.9Gb - 2gb is the limit. So, please clarify what you mean ?

What exactly do your shell scripts do in relation to oracle ?
The CMS DB analysis has to be done on Oracle DB itself, you need to see how many connections are there, how fast queries are returned etc....

Basically this sounds like you need a proper sizing exercise for the type of load you're placing on this Xi3.1 system.

If this is the location of the crash :

#3  0x58687b80 in skgesigCrash () from /opt/oracle/product/11.1.0/client_1/lib32/libclntsh.so

then maybe the fact that your CMS DB is on version 10 and your client on version 11 has something to do with it.

Per supported platforms guide, if CMS DB is on Oracle 10 , the client has to be 10.
11 -11 and 11.2 - 11.2

0 Kudos

Hello,

please set the ulimit to unlimited.

Regards

-Seb.

Former Member
0 Kudos

Thanks Denis & Seb. I'll work with internal change management before updating the ulimit (to unlimited) and oracle client to 10GR2.4. Considering the reporting database (11.G2.4), we even tried installing 11.G2.4 client as per SAP's recommendation however it didnt help.

Below is the result of my memory analysis shell script which shows the CMS crash at 3.6g.

Wed Jan 28 19:05:35 GMT 2015

MemTotal:     32947392 kB

MemFree:      13285252 kB

Buffers:        920916 kB

Cached:       13169796 kB

SwapCached:        196 kB

PID USER            VIRT RES CPU% MEM% PROC_NAME

8295 bobje     25   3820m 3.6g 51m R 100.4 11.3   7:51.34 boe_cmsd         

9492 bobje     18   376m  252m  57m S  0.0  0.8   2:34.72 WIReportServer    

9589 bobje     18   278m  153m  57m S  0.0  0.5   0:42.87 WIReportServer

**************EOL******************

Wed Jan 28 19:05:45 GMT 2015

MemTotal:     32947392 kB

MemFree:      16565332 kB

Buffers:        921304 kB

Cached:       13554876 kB

SwapCached:        196 kB

PID USER             VIRT RES CPU% MEM% PROC_NAME

11260 bobje    15   88380  28m  18m S 46.5  0.1   0:04.36 boe_cmsd          

9492 bobje     18   376m 252m  57m S  0.0  0.8   2:34.73 WIReportServer    

9589 bobje     18   278m 153m  57m S  0.0  0.5   0:42.87 WIReportServer

I'll keep you posted on the result after ulimit change.

Thanks,

Karthik