on 10-24-2014 2:45 PM
Hi
I have an ASE Cluster Edition 15.7 SP121 and in rare circumstances all the processes in the database been became in status sleeping. There is not errors in the Sybase ErrorLog. The Event ID for these process is 512 (waiting for buffer validation in cmcc_bufsearch) and 509 (waiting for buffer read in cmcc buf search). Additionally I need documentation about the possible events in Cluster Edition.
The solution until the moment has been Kill on dataserver because there is not posible kill the process or restart the server.
Anybody Can help me?.
Is this resolved? Is SAP still recommending to only use one half of the cluster for write operations to a particular object? If so, how will it compete with Oracle RAC? They seem to have no issues writing to both sides of the RAC to the same object.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
I can't speak to whether or not the individuals problem was resolved, however, I can speak to the notion of writing to the same object. Attempting to write to the same table from different nodes is often used as a measuring stick for horizontal scalability - which is something that shared disk clusters are horrible at. While it is possible to write to an object from any node of the cluster, you will suffer performance hits while doing so - and depending on the situation how bad the peformance is. For example, if inserting into a heap table or a table ordered by an index that is monotonic (e.g. trade_date), all the inserts will be after the last page. As a common page, this page has to be synchronized in the caches among all cluster nodes - which means essentially that any write from any of the nodes will be competing against all other nodes. Now, a point to remember, is we are not just talking data pages - but also index pages - which due to nature of sorted leaf pages is a high contested area for cache synchronization. Further, keep in mind that to insert a single row, we are often performing 20+ IO's - we also need to traverse each index tree....and if we have ~6 indexes with an index level of 5 - there's 30 IOs.....and if inserts are happening on different nodes, it is likely that we are attempting to do cache synchronization on multiple pages while negotiating locking on a lot more for read consistency of intermediate nodes of index trees, etc.
Where some have tried to alleviate this problem is by partitioning the tables/indexes and binding different partitions to different nodes. E.g. A-F on node1, G-K on node2, etc. The problem with this becomes apparent when you have a transaction that inserts both an 'F' and a 'J' row.....the immediate question is whether this is an oddity and you do the cache synchronzition as a rare event...or you do you do a 2PC across the nodes. Given the index problem above, early tests showed that doing query fragmentation in which parts of the query are sent to other nodes vs. cache synchronization was much faster.....but there is a considerable overhead in the 2PC layer vs. an SMP implementation.....in addition the network latency. But, fundamentally, even RAC customers agree - horizontal scaling works best when there is no contention between nodes - which implies either implicit or explicit application partitioning in conjunction with database implementations to aid the separation.
Where this technique has worked the best - and is a common use case with RAC - is for DSS systems in which query fragmentation/distributed parallel query processing can provide performance boosts for large queries. However, for OLTP systems, we have found that the real impediments to scaling are elsewhere - and that attempting to horizontally scale often results in negative results. For example, one of the biggest bottlenecks to OLTP scaling is IO processing - especially if using HDD's - whether in a SAN or not (and most SDC's require a SAN for shared storage). One customer test went from 3K inserts/sec on HDD based SAN to 100K inserts/sec using an SSD - for a 30x scaling factor that horizontal scaling could not have achieved....simply because the bottleneck was the IO subsystem - which would be shared. With 16sp02, we have added a number of features which we have seen improve performance by 2x-7x - which is better than horizontal scaling which is often <<2x even for 2 nodes.
Funny thing is that the most oft quoted reason for horizontal scaling is "start small/grow big". In reality, the economics are extremely against you on that one. 5 years ago, the top-end HP DL580 was roughly $100K and supported a max of 32 cores. Today, that same box has double the cores but still has the same price. So, it simply makes sense - especially from a power & cooling standpoint - to rip & replace.
Having said all of that, ASE CE 16sp01 (just released in Dec for Linux) includes support for RDMA. One of the biggest impediments we had for any internode communications with ASE CE 15.7 was the reliance on UDP - a packet framing protocol that has considerable latency built into the network handling. RDAM is similar to disk IO DMA in that it provides much lower latency directly to the remote daa instead. Early tests showed that this improved general CIPC performance by at least 35% and in the cases of "badly partitioned" applications, they improved 200%. Does that mean we are now suggesting horizontal scalability is a recognized use case for ASE CE??? No. It does mean, however, that a lot of applications may benefit due to dramatically reduced CIPC overhead. Will you be able to scale horizontally with a specific app??? Who knows. Only testing it can say for sure. However, the odds are not good just from a DBMS/SDC science perspective.
Hi Cristian,
have you figured out the reason of this events?
Recently I´ve had the same situation, and in some circumstances the Cluster (with 2 nodes) hanged.
In my case the problem was the configuration of two parameters, related to cache replacement.
Both was set to 2000, and we change to 0.
"number of index trips"
"number of oam trips"
Fernando
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Cristian,
I pass through this situation many times with many processes and the finally recommendation of Sybase or SAP today its not having any object in witch both instances make dml operations because of internal sync of locks and pages. The processes sleeps in events mostly related with CLM or OCM.
SAP releases SP 130 with many changes, one its single instance database in order to config an ownership to a database on an instance. Have you taken a look at it? Perhaps it may help you.
Regards.
Javier.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Cristian,
How are you accessing data in the cluster? All applications that access a given database should be coming through a single instance/node. We do not recommend inserts/updates/deletes to a given database from multiple instances. I'd like to rule this out before digging deeper.
Regards,
Mark Kusma
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Mark,
The Cluster is Active/Active and There is two Logical Cluster:
Nodo1 Nodo2
Logical Cluster 1: base failover (users can Insert/Update/Delete)
Logical Cluster 2: failover base (users can Only Select)
We have implemented monitors on two nodes with the same user simultaneously writting over the same database, but the problem has ocurred two times in the last month.
Thanks!
Cristian!
Hi Cristian,
Hopefully users can also Select using Logical Cluster 1. That would bring the page into cache in Node1 so that when the insert/update/delete happens, the page is already there with the proper cluster lock.
I am concerned about the monitor that updates the same database from both nodes. We would not recommend this because of the contention on the last log page and the processing required to frequently move it between nodes. If the write activity is infrequent, then it might not be much of an impact, however, frequent writes will really impact performance. I would avoid writes to the same database from multiple instances.
The problem you described sounds like a spid requested a cluster lock or a change to an existing lock and communication was required with the other instance that never completed. For example, instance1 sends a message to instance2 to send a page across the interconnect and for some reason instance2 either never receives the message or never responds. In this case, instance1 is left waiting for instance2 and instance2 doesn't realize it has any outstanding request. You would need a cluster wide shared memory dump when this condition happens so that we can check the status on both nodes.
To set this up in advance, use these commands, substituting a directory that is valid from both nodes:
sp_shmdumpconfig 'add','dbcc',null,1,'/directory',null,'cluster_all'
go
sp_configure 'dump on conditions',1
go
When the cluster reaches this condition where the spids are sleeping, login and issue:
dbcc traceon(3604)
dbcc memdump('1')
go
The memdumps will be written from both nodes, each containing the local ASE shared memory. You can then open an incident and provide these memdumps for analysis. Once the memdumps are captured, you can proceed to kill and restart the cluster.
Mark
Hi Mark,
Thanks for your answer, I am going to setup the memory dump for the next event.
The cluster have 2 private interfaces . In the output for the table
monCIPCLinks for the second interface the column PassiveState usually is
"In doubt" , is it a normal status?
1 | 300 | 50326400 | priv1-cluster1 | priv2-cluster1 | Up | Up |
1 | 2600 | 2200 | priv1-cluster2 | priv2-cluster2 | In doubt | Up |
Do you have documentation about the events ID´s higher than 350.?
Thanks,
Cristian!
Hi Cristian,
Yes, "In doubt" is correct for the secondary interconnect. Some of wait events over 350 are documented in the P&T Guide, but not the ones you are seeing. I'll see what I can find internally on them.
http://help.sap.com/Download/Multimedia/ASE_16.0/pttables.pdf
Mark
Hi Cristian,
It does not have to be nfs. The path just has to exist on both nodes. For example, if you specify "/sybase/memdumps/", be sure that directory exists on both. I've seen customers specify a path that only exists on one node and then the CSMD fails.
Also, we are going to start posting some Wait Event information on the ASE wiki space (there is a lot of other information there too):
SAP ASE Home - SAP ASE - SCN Wiki
Regards,
Mark
Hi Mark,
1. The output from sp_shmdumpconfig is the following:
Configured Shared Memory Dump Conditions
----------------------------------------
Dbcc ---
Type: csmd
Maximum Dumps: 1
Dumps since boot: 1
Halt Engines: Default (Halt)
Cluster: All
Page Cache: Default (Omit)
Procedure Cache: Default (Include)
Unused Space: Default (Omit)
Dump Directory: /backup/mem_dump
Dump File Name: mem_dump_sleep
Estimated csmd Size: 31984 MB
Defaults ---
Type: csmd
Maximum Dumps: 1
Halt Engines: Halt
Cluster: Local
Page Cache: Omit
Procedure Cache: Include
Unused Space: Omit
Dump Directory: $SYBASE
Dump File Name: Generated File Name
Estimated csmd Size: 31984 MB
Current number of conditions: 1
Maximum number of conditions: 10
Configurable Shared Memory Dump Configuration Settings
------------------------------------------------------
Dump on conditions: 1
Number of dump threads: 1
Include errorlog in dump file: 1
Merge parallel files after dump: 1
Shared memory dump file compression level: 0
Server Memory Allocation
Procedure Cache Data Caches Server Memory Total Memory
--------------- ----------- ------------- ------------
2258 MB 57802 MB 29727 MB 89785 MB
NOTE: Dump file size estimates are approximate. If Cluster
mode is set to All for a dump condition then a shared
memory dump file will be created for each instance in
the cluster. The estimated file size represents the total
amount of space required for all shared memory dump
files created for all instances.
2. The messages in the errorlog while the Memory Dump was executed:
01:0002:00000:00649:2014/12/02 08:09:53.17 server DBCC TRACEON 3604, SPID 649
01:0002:00000:00649:2014/12/02 08:09:53.18 server dbcc memdump('1') executed.
01:0002:00000:00649:2014/12/02 08:09:53.18 server Shared memory dump initiation message sent to other nodes in the cluster.
01:0002:00000:00649:2014/12/02 08:09:53.18 server Initiating shared memory dump for dbcc 0.
01:0002:00000:00649:2014/12/02 08:09:53.21 kernel Dumping shared memory to dump file: /backup/mem_dump/mem_dump_sleep
01:0002:00000:00649:2014/12/02 08:09:53.21 kernel Writing segment 0:
01:0002:00000:00649:2014/12/02 08:09:53.21 kernel Thread (0): Writing 25742370816 bytes starting at 0x0x2aaaaac00000
01:0001:00000:00000:2014/12/02 08:11:01.65 kernel Warning: The internal timer is not progressing. If this message is generated multiple times, report to Sybase Technical Support and restart the server (alarminterval=-684).
01:0002:00000:00649:2014/12/02 08:13:18.74 kernel Writing segment 1:
01:0002:00000:00649:2014/12/02 08:13:18.74 kernel Thread (0): Writing 6066587648 bytes starting at 0x0x2abba59d6800
01:0002:00000:00649:2014/12/02 08:14:30.71 kernel Writing segment 2:
01:0002:00000:00649:2014/12/02 08:14:30.71 kernel Thread (0): Writing 1050198016 bytes starting at 0x0x2abf03363000
01:0002:00000:00649:2014/12/02 08:14:43.42 kernel Writing segment 3:
01:0002:00000:00649:2014/12/02 08:14:43.42 kernel Thread (0): Writing 658233344 bytes starting at 0x0x2ac06ddef000
01:0002:00000:00649:2014/12/02 08:14:51.01 kernel Writing segment 4:
01:0002:00000:00649:2014/12/02 08:14:51.01 kernel Thread (0): Writing 81920 bytes starting at 0x0x2ac0952ac800
01:0002:00000:00649:2014/12/02 08:14:51.01 kernel Writing segment 5:
01:0002:00000:00649:2014/12/02 08:14:51.01 kernel Thread (0): Writing 9895936 bytes starting at 0x0x2ac0954bc800
01:0002:00000:00649:2014/12/02 08:14:51.15 kernel Writing segment 6:
01:0002:00000:00649:2014/12/02 08:14:51.15 kernel Thread (0): Writing 14336 bytes starting at 0x0x2ac09abfc800
01:0002:00000:00649:2014/12/02 08:14:51.15 kernel
Copying errorlog into dump file.
01:0002:00000:00649:2014/12/02 08:14:51.15 kernel Dump complete in 299 seconds.
01:0002:00000:00649:2014/12/02 08:14:51.15 kernel 7 segments of total size 33527382016 bytes written to dump file.
01:0002:00000:00649:2014/12/02 08:14:51.15 server Shared memory dump completed successfully.
3. In the isql session not generated messages of error.
Thanks!
Cristian.
Any output from instance2's errorlog? Is /backup/mem_dump/ a shared directory between the nodes? If so, since you specified a file name (mem_dump_sleep), it may have been overwritten or prevented from generating on the second node. The file name should be specified as "null" as in my example, so that a generated file name is used.
Mark
User | Count |
---|---|
85 | |
10 | |
10 | |
9 | |
6 | |
6 | |
6 | |
5 | |
4 | |
3 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.