on 01-25-2015 8:00 AM
Happy new year to everyone,
Our BOXIR3.1SP6FP2 env has recently started behaving weirdly by triggering multiple output to users inbox and email notification out of scheduled reports. Also we have noticed the CMS crash with core file (almost 4GB) generation at the time of multiple report output.
Most of the times, CMC crashes and recycles itself. At few times, CMS services alone went shut down.
OS details: RHEL 5.5, 32 GB RAM, 8 core processor on each of the clustered node, Oracle 10GR2.4 CMS DB server, 11GR2.4 oracle reporting DB server and oracle 11.1.0.6 client.
2015/01/21 23:54:37.946|>=| | |28123|1534131088|{|||||||||||||||DBQueue::Read
2015/01/21 23:54:37.946|==| | |28123|1496185744|
|||||||||||||||(OracleStatement.cpp:156) Prepare: SQL: SELECT ObjectID,
Version, LastModifyTime, CRC, Properties FROM CMS_InfoObjects6 WHERE ObjectID
IN (1004050) ORDER BY ObjectID
2015/01/21 23:54:37.946|==| | |28123|1496185744| ||||||||||||||(OracleStatement.cpp:183) Prepared statement Execute
2015/01/21 23:54:37.965|==| | |28123|1496451984| |||||||||||||||SResourceSource::LoadString 50293
2015/01/21 23:54:37.966|==| | |28123|1496451984| |||||||||||||||SResourceSource::LoadString Unknown exception in database thread
2015/01/21 23:54:37.967|==| | |28123|1496451984| |||||||||||||||SResourceSource::LoadString 33007
2015/01/21 23:54:37.967|==| | |28123|1496451984| |||||||||||||||SResourceSource::LoadString CMS is unstable and will shut down immediately. Reason: %1...
2015/01/21 23:54:38.506|==| | |28123|1496185744| |||||||||||||||(OracleStatement.cpp:156) Prepare: SQL: SELECT ObjectID,
Version, LastModifyTime, CRC, Properties FROM CMS_InfoObjects6 WHERE ObjectID IN (1009213) ORDER BY ObjectID
2015/01/21 23:54:38.506|==| | |28123|1496185744| |||||||||||||||(OracleStatement.cpp:183) Prepared statement Execute
2015/01/21 23:54:38.512|==| | |28123|1455592672| |||||||||||||||(sidaemon.cpp:549) SUNIXDaemon::run: server restart flag is 1..
2015/01/21 23:54:38.513|==| | |28123|1455592672| |||||||||||||||(sidaemon.cpp:552) SUNIXDaemon::run: in abort ...
2015/01/21 23:54:38.513|==| | |28123|1455592672| |||||||||||||||(sidaemon.cpp:555) SUNIXDaemon::run: doing the WithAbort case ...
2015/01/21 23:54:38.520|==| | |28123|1496185744| |||||||||||||||(dbq.cpp:1357) DBQ: Time required to read 1 objects: 20.000000 ms
Thank you,
Karthik
Hi All,
The issue got resolved few months ago and thought of updating the resolution here.
Cause: CMS crashed while reading a large or inflated string value for metadata property SI_WEBIDOCPROPERTIES saved with the physical report. As per SAP support engineer, this might have grown abnormally through the use of incompatible webi rich client version.
Resolution:
A. Appended the parameter "-maxobjectsincache 1000" at CMC-->Servers-->CMS command line section
B. Reduced the "SizeOfTheLargestObjectAllowedInTheCacheInBytes"="10000" at .registry file present at ../../bobje/data/.bobj/registry/software/business objects/suite 12.0/cms/instances/CMS_name.cms/
C. Also SAP provided a SDK LA fix code to clean or remove the large string value for the property SI_WEBIDOCPROPERTIES for all webi documents present in the system.
Preventive measure: As per SAP's recommendation, better to use the compatible BO client tool same as the server version.
Thanks,
Karthik
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Kartik,
Could you confirm if all 14 or 15 connections are established with the DB
- You could get this as an info in the system logs, /var/logs/messages
- Also one can check the active connections using
netstat -tulpn | grep 6400 >> /home/<username>/out.txt
or
netstat -an | grep 6400 >> /home/<username>/out.txt
Additionally check if there are any audit backlogs in the /auditing directory
-Guru
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Guru,
CMS system logs (/var/log/messages) shows 14 connections established eventually however with intermediate TNS error, partial connectivity issue, system out of memory error.
Jan 28 19:51:31 Linuxservername boe_cmsd[571]: CMS is unstable and will shut down immediately. Reason: BusinessObjects Enterprise CMS: Unable to connect to the CMS system database ""UKOLTU55"". Reason: ORA-12541: TNS:no listener
Jan 28 19:51:31 Linuxservername boe_cmsd[571]: The root server reported an error Initialization Failure. (Reason: BusinessObjects Enterprise CMS: Unable to connect to the CMS system database ""UKOLTU55"". Reason: ORA-12541: TNS:no listener BusinessObjects Enterprise CMS: Unable to connect to the CMS system database ""UKOLTU55"". Reason: ORA-12541: TNS:no listener CDatabase::Open failure. ).
Jan 28 19:51:31 Linuxservername boe_cmsd[571]: Central Management Server stopped
Jan 28 19:51:31 Linuxservername boe_cmsd[575]: Central Management Server started
Jan 28 19:51:48 Linuxservername boe_cmsd[575]: BusinessObjects Enterprise CMS: Partially connected to CMS system database ""UKOLTU55"". 14 CMS system database connections were requested, but only 6 connections could be established. Reason: ORA-12541: TNS:no listener
...
Jan 28 19:52:48 Linuxservername boe_cmsd[575]: BusinessObjects Enterprise CMS: Partially connected to CMS system database ""UKOLTU55"". 14 CMS system database connections were requested, but only 10 connections could be established. Reason: ORA-12541: TNS:no listener
Jan 28 19:53:48 Linuxservername boe_cmsd[575]: BusinessObjects Enterprise CMS: Successfully established all 14 connections to CMS system database ""UKOLTU55"".
....
Jan 28 20:47:39 Linuxservername boe_cmsd[575]: CMS is unstable and will shut down immediately. Reason: System is out of memory.
Thanks,
Karthik
Is it VM server? if it is VM server, make sure server has enough resources available all the time.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello,
looks like your report bursting is creating too much load. Check the following note:
http://service.sap.com/sap/support/notes/1463190
Also consider setting up the ulimit value on your Hosts.
Regards
-Seb.
Can you show the result of ulimit -a command here ?
You need to troubleshoot this issue, instead of looking for one answer that fixes everything.
For example - what is the load threshold that triggers CMS server crash ?
What happens with CMS DB at that time ? (is your CMS DB on the same box as BOE or different)
What are the resource usage metrics at the time of failure?
Have you analyzed the core files themselves ?
Hi Denis,
I'm trying my best for the last few weeks to understand the core issue along with SAP however it is still a mystery.
>Ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 270335
max locked memory (kbytes, -l) 32
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 10240
cpu time (seconds, -t) unlimited
max user processes (-u) 270335
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
Below is the observation as part of troubleshooting:
1. CMS breaks at threshold of 3.9 G.
2. CMS DB sits in a different Linux server than BOE server.
3. All core files were generated by boe_cmsd process and are almost 4GB in size (same as max threshold which it breaks).
4. Shell script which I've added in the BOE servers shows that the CMS DB is available/connecting at the time of CMS crash.
5. SAP analysed the Core files and skeptical about the below lines.
#3 0x58687b80 in skgesigCrash ()
from /opt/oracle/product/11.1.0/client_1/lib32/libclntsh.so
#4 0x58687e0d in skgesig_sigactionHandler ()
I'll continue troubleshooting with a hope to fix it at the earliest.
Thanks,
Karthik
well, open files ulimit is still 1024, which is well below what's needed.
By thresholds I meant report load in your scheduling. Does it crashes when you have 10 report jobs, or 100 or in between ?
XI3.1 is 32 bit architecture and CMS is 32 bit process, it can't reach 3.9Gb - 2gb is the limit. So, please clarify what you mean ?
What exactly do your shell scripts do in relation to oracle ?
The CMS DB analysis has to be done on Oracle DB itself, you need to see how many connections are there, how fast queries are returned etc....
Basically this sounds like you need a proper sizing exercise for the type of load you're placing on this Xi3.1 system.
If this is the location of the crash :
#3 0x58687b80 in skgesigCrash () from /opt/oracle/product/11.1.0/client_1/lib32/libclntsh.so
then maybe the fact that your CMS DB is on version 10 and your client on version 11 has something to do with it.
Per supported platforms guide, if CMS DB is on Oracle 10 , the client has to be 10.
11 -11 and 11.2 - 11.2
Thanks Denis & Seb. I'll work with internal change management before updating the ulimit (to unlimited) and oracle client to 10GR2.4. Considering the reporting database (11.G2.4), we even tried installing 11.G2.4 client as per SAP's recommendation however it didnt help.
Below is the result of my memory analysis shell script which shows the CMS crash at 3.6g.
Wed Jan 28 19:05:35 GMT 2015
MemTotal: 32947392 kB
MemFree: 13285252 kB
Buffers: 920916 kB
Cached: 13169796 kB
SwapCached: 196 kB
PID USER VIRT RES CPU% MEM% PROC_NAME
8295 bobje 25 3820m 3.6g 51m R 100.4 11.3 7:51.34 boe_cmsd
9492 bobje 18 376m 252m 57m S 0.0 0.8 2:34.72 WIReportServer
9589 bobje 18 278m 153m 57m S 0.0 0.5 0:42.87 WIReportServer
**************EOL******************
Wed Jan 28 19:05:45 GMT 2015
MemTotal: 32947392 kB
MemFree: 16565332 kB
Buffers: 921304 kB
Cached: 13554876 kB
SwapCached: 196 kB
PID USER VIRT RES CPU% MEM% PROC_NAME
11260 bobje 15 88380 28m 18m S 46.5 0.1 0:04.36 boe_cmsd
9492 bobje 18 376m 252m 57m S 0.0 0.8 2:34.73 WIReportServer
9589 bobje 18 278m 153m 57m S 0.0 0.5 0:42.87 WIReportServer
I'll keep you posted on the result after ulimit change.
Thanks,
Karthik
User | Count |
---|---|
90 | |
10 | |
10 | |
10 | |
7 | |
7 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.