cancel
Showing results for 
Search instead for 
Did you mean: 

Events Cluster Edition

Former Member
0 Kudos

Hi

I have an ASE Cluster Edition 15.7 SP121 and in rare circumstances all the processes in the database been became in status sleeping. There is not  errors in the Sybase ErrorLog. The Event ID for these process is 512 (waiting for buffer validation in cmcc_bufsearch) and 509 (waiting for buffer read in cmcc buf search). Additionally I need documentation about the possible events in Cluster Edition.

The solution until the moment has been Kill on dataserver because there is not posible kill the process or restart the server.

Anybody Can help me?.

Accepted Solutions (0)

Answers (4)

Answers (4)

Former Member
0 Kudos

Is this resolved?  Is SAP still recommending to only use one half of the cluster for write operations to a particular object?  If so, how will it compete with Oracle RAC?  They seem to have no issues writing to both sides of the RAC to the same object.

former_member182259
Contributor
0 Kudos

I can't speak to whether or not the individuals problem was resolved, however, I can speak to the notion of writing to the same object.    Attempting to write to the same table from different nodes is often used as a measuring stick for horizontal scalability - which is something that shared disk clusters are horrible at.    While it is possible to write to an object from any node of the cluster, you will suffer performance hits while doing so - and depending on the situation how bad the peformance is.   For example, if inserting into a heap table or a table ordered by an index that is monotonic (e.g. trade_date), all the inserts will be after the last page.   As a common page, this page has to be synchronized in the caches among all cluster nodes - which means essentially that any write from any of the nodes will be competing against all other nodes.   Now, a point to remember, is we are not just talking data pages - but also index pages - which due to nature of sorted leaf pages is a high contested area for cache synchronization.   Further, keep in mind that to insert a single row, we are often performing 20+ IO's - we also need to traverse each index tree....and if we have ~6 indexes with an index level of 5 - there's 30 IOs.....and if inserts are happening on different nodes, it is likely that we are attempting to do cache synchronization on multiple pages while negotiating locking on a lot more for read consistency of intermediate nodes of index trees, etc.

Where some have tried to alleviate this problem is by partitioning the tables/indexes and binding different partitions to different nodes.   E.g. A-F on node1, G-K on node2, etc.   The problem with this becomes apparent when you have a transaction that inserts both an 'F' and a 'J' row.....the immediate question is whether this is an oddity and you do the cache synchronzition as a rare event...or you do you do a 2PC across the nodes.   Given the index problem above, early tests showed that doing query fragmentation in which parts of the query are sent to other nodes vs. cache synchronization was much faster.....but there is a considerable overhead in the 2PC layer vs. an SMP implementation.....in addition the network latency.   But, fundamentally, even RAC customers agree - horizontal scaling works best when there is no contention between nodes - which implies either implicit or explicit application partitioning in conjunction with database implementations to aid the separation.

Where this technique has worked the best - and is a common use case with RAC - is for DSS systems in which query fragmentation/distributed parallel query processing can provide performance boosts for large queries.   However, for OLTP systems, we have found that the real impediments to scaling are elsewhere - and that attempting to horizontally scale often results in negative results.   For example, one of the biggest bottlenecks to OLTP scaling is IO processing - especially if using HDD's - whether in a SAN or not (and most SDC's require a SAN for shared storage).   One customer test went from 3K inserts/sec on HDD based SAN to 100K inserts/sec using an SSD - for a 30x scaling factor that horizontal scaling could not have achieved....simply because the bottleneck was the IO subsystem - which would be shared.   With 16sp02, we have added a number of features which we have seen improve performance by 2x-7x - which is better than horizontal scaling which is often <<2x even for 2 nodes.

Funny thing is that the most oft quoted reason for horizontal scaling is "start small/grow big".   In reality, the economics are extremely against you on that one.   5 years ago, the top-end HP DL580 was roughly $100K and supported a max of 32 cores.   Today, that same box has double the cores but still has the same price.   So, it simply makes sense - especially from a power & cooling standpoint - to rip & replace.

Having said all of that, ASE CE 16sp01 (just released in Dec for Linux) includes support for RDMA.   One of the biggest impediments we had for any internode communications with ASE CE 15.7 was the reliance on UDP - a packet framing protocol that has considerable latency built into the network handling.   RDAM is similar to disk IO DMA in that it provides much lower latency directly to the remote daa instead.   Early tests showed that this improved general CIPC performance by at least 35% and in the cases of "badly partitioned" applications, they improved 200%.    Does that mean we are now suggesting horizontal scalability is a recognized use case for ASE CE???   No.   It does mean, however, that a lot of applications may benefit due to dramatically reduced CIPC overhead.    Will you be able to scale horizontally with a specific app??? Who knows.   Only testing it can say for sure.    However, the odds are not good just from a DBMS/SDC science perspective.

Former Member
0 Kudos

Thanks for the explanation, Jeff!

Former Member
0 Kudos

Hi Cristian,

have you figured out the reason of this events?

Recently I´ve had the same situation, and in some circumstances the Cluster (with 2 nodes) hanged.

In my case the problem was the configuration of two parameters, related to cache replacement.

Both was set to 2000, and we change to 0.

"number of index trips"

"number of oam trips"

Fernando

javier_barthe
Participant
0 Kudos

Hi Cristian,

I pass through this situation many times with many processes and the finally recommendation of Sybase or SAP today its not having any object in witch both instances make dml operations because of internal sync of locks and pages. The processes sleeps in events mostly related with CLM or OCM.

SAP releases SP 130 with many changes, one its single instance database in order to config an ownership to a database on an instance. Have you taken a look at it? Perhaps it may help you.

Regards.
Javier.

sap_mk
Active Participant
0 Kudos

Hi Cristian,

How are you accessing data in the cluster? All applications that access a given database should be coming through a single instance/node. We do not recommend inserts/updates/deletes to a given database from multiple instances. I'd like to rule this out before digging deeper.

Regards,

Mark Kusma

Former Member
0 Kudos

Hi Mark,

The Cluster is Active/Active and There is two Logical Cluster:

                         Nodo1      Nodo2

Logical Cluster 1: base       failover (users can Insert/Update/Delete)

Logical Cluster 2: failover    base (users can Only Select)

We have implemented monitors on two nodes with the same user simultaneously writting over the same database, but the problem has ocurred two times in the last month.

Thanks!

Cristian!

sap_mk
Active Participant
0 Kudos

Hi Cristian,

Hopefully users can also Select using Logical Cluster 1. That would bring the page into cache in Node1 so that when the insert/update/delete happens, the page is already there with the proper cluster lock.

I am concerned about the monitor that updates the same database from both nodes. We would not recommend this because of the contention on the last log page and the processing required to frequently move it between nodes. If the write activity is infrequent, then it might not be much of an impact, however, frequent writes will really impact performance. I would avoid writes to the same database from multiple instances.

The problem you described sounds like a spid requested a cluster lock or a change to an existing lock and communication was required with the other instance that never completed. For example, instance1 sends a message to instance2 to send a page across the interconnect and for some reason instance2 either never receives the message or never responds. In this case, instance1 is left waiting for instance2 and instance2 doesn't realize it has any outstanding request. You would need a cluster wide shared memory dump when this condition happens so that we can check the status on both nodes.

To set this up in advance, use these commands, substituting a directory that is valid from both nodes:

sp_shmdumpconfig 'add','dbcc',null,1,'/directory',null,'cluster_all'

go

sp_configure 'dump on conditions',1

go

When the cluster reaches this condition where the spids are sleeping, login and issue:

dbcc traceon(3604)

dbcc memdump('1')

go

The memdumps will be written from both nodes, each containing the local ASE shared memory. You can then open an incident and provide these memdumps for analysis. Once the memdumps are captured, you can proceed to kill and restart the cluster.

Mark

Former Member
0 Kudos

Hi Mark,

Thanks for your answer, I am going to setup the memory dump for the next event.

The cluster have 2 private interfaces . In the output for the table

monCIPCLinks for the second interface the column PassiveState usually is

"In doubt" , is it a normal status?

130050326400priv1-cluster1priv2-cluster1UpUp
126002200priv1-cluster2priv2-cluster2In doubtUp

Do you have documentation about the events ID´s higher than 350.?

Thanks,

Cristian!

sap_mk
Active Participant
0 Kudos

Hi Cristian,

Yes, "In doubt" is correct for the secondary interconnect. Some of wait events over 350 are documented in the P&T Guide, but not the ones you are seeing. I'll see what I can find internally on them.

http://help.sap.com/Download/Multimedia/ASE_16.0/pttables.pdf

Mark

Former Member
0 Kudos

Hi Mark,

Thanks for your collaboration I am going to study this document and I am going to apply your recommendations  for the next event.

Cristian,

Former Member
0 Kudos

Hi Mark:

If you say " directory that is valid from both nodes" must be a nfs?.

Each Instance generate a file independent?

Thanks!

Cristian

sap_mk
Active Participant
0 Kudos

Hi Cristian,

It does not have to be nfs. The path just has to exist on both nodes. For example, if you specify "/sybase/memdumps/", be sure that directory exists on both. I've seen customers specify a path that only exists on one node and then the CSMD fails.

Also, we are going to start posting some Wait Event information on the ASE wiki space (there is a lot of other information there too):

SAP ASE Home - SAP ASE - SCN Wiki

Regards,

Mark

Former Member
0 Kudos

Hi Mark,

The problem again occurred yesterday,I set up the memory dump following your indications (same path exist on both nodes and the form of execution) but definitely not generate two files (one for each node) only one file.

Do you know what is the problem?.

Thanks.

sap_mk
Active Participant
0 Kudos

Hi Cristian,

Can you post the output from sp_shmdumpconfig (with no options)? Also, are there any messages in the errorlog from when you triggered the memdump? Any messages in your isql session where you triggered it?

Regards,

Mark

Former Member
0 Kudos

Hi Mark,

1. The output from sp_shmdumpconfig  is the following:

Configured Shared Memory Dump Conditions

----------------------------------------

  Dbcc       ---                

    Type:                   csmd     

    Maximum Dumps:          1     

    Dumps since boot:       1 

    Halt Engines:           Default (Halt)

    Cluster:                All   

    Page Cache:             Default (Omit)

    Procedure Cache:        Default (Include)

    Unused Space:           Default (Omit)

    Dump Directory:         /backup/mem_dump

    Dump File Name:         mem_dump_sleep

    Estimated csmd Size:    31984 MB

  Defaults   ---                

    Type:                   csmd     

    Maximum Dumps:          1     

    Halt Engines:           Halt  

    Cluster:                Local 

    Page Cache:             Omit  

    Procedure Cache:        Include

    Unused Space:           Omit  

    Dump Directory:         $SYBASE

    Dump File Name:         Generated File Name

    Estimated csmd Size:    31984 MB

Current number of conditions: 1

Maximum number of conditions: 10

Configurable Shared Memory Dump Configuration Settings

------------------------------------------------------

Dump on conditions: 1

Number of dump threads: 1

Include errorlog in dump file: 1

Merge parallel files after dump: 1

Shared memory dump file compression level: 0

Server Memory Allocation

Procedure Cache  Data Caches  Server Memory  Total Memory

---------------  -----------  -------------  ------------

        2258 MB     57802 MB       29727 MB      89785 MB

NOTE: Dump file size estimates are approximate.  If Cluster

      mode is set to All for a dump condition then a shared

      memory dump file will be created for each instance in

      the cluster. The estimated file size represents the total

      amount of space required for all shared memory dump

      files created for all instances.

2. The messages in the errorlog while the Memory Dump was executed:

01:0002:00000:00649:2014/12/02 08:09:53.17 server  DBCC TRACEON 3604, SPID 649

01:0002:00000:00649:2014/12/02 08:09:53.18 server  dbcc memdump('1') executed.

01:0002:00000:00649:2014/12/02 08:09:53.18 server  Shared memory dump initiation message sent to other nodes in the cluster.

01:0002:00000:00649:2014/12/02 08:09:53.18 server  Initiating shared memory dump for dbcc 0.

01:0002:00000:00649:2014/12/02 08:09:53.21 kernel  Dumping shared memory to dump file: /backup/mem_dump/mem_dump_sleep

01:0002:00000:00649:2014/12/02 08:09:53.21 kernel  Writing segment 0:

01:0002:00000:00649:2014/12/02 08:09:53.21 kernel  Thread (0): Writing 25742370816 bytes starting at 0x0x2aaaaac00000

01:0001:00000:00000:2014/12/02 08:11:01.65 kernel  Warning: The internal timer is not progressing. If this message is generated multiple times, report to Sybase Technical Support and restart the server (alarminterval=-684).

01:0002:00000:00649:2014/12/02 08:13:18.74 kernel  Writing segment 1:

01:0002:00000:00649:2014/12/02 08:13:18.74 kernel  Thread (0): Writing 6066587648 bytes starting at 0x0x2abba59d6800

01:0002:00000:00649:2014/12/02 08:14:30.71 kernel  Writing segment 2:

01:0002:00000:00649:2014/12/02 08:14:30.71 kernel  Thread (0): Writing 1050198016 bytes starting at 0x0x2abf03363000

01:0002:00000:00649:2014/12/02 08:14:43.42 kernel  Writing segment 3:

01:0002:00000:00649:2014/12/02 08:14:43.42 kernel  Thread (0): Writing 658233344 bytes starting at 0x0x2ac06ddef000

01:0002:00000:00649:2014/12/02 08:14:51.01 kernel  Writing segment 4:

01:0002:00000:00649:2014/12/02 08:14:51.01 kernel  Thread (0): Writing 81920 bytes starting at 0x0x2ac0952ac800

01:0002:00000:00649:2014/12/02 08:14:51.01 kernel  Writing segment 5:

01:0002:00000:00649:2014/12/02 08:14:51.01 kernel  Thread (0): Writing 9895936 bytes starting at 0x0x2ac0954bc800

01:0002:00000:00649:2014/12/02 08:14:51.15 kernel  Writing segment 6:

01:0002:00000:00649:2014/12/02 08:14:51.15 kernel  Thread (0): Writing 14336 bytes starting at 0x0x2ac09abfc800

01:0002:00000:00649:2014/12/02 08:14:51.15 kernel

Copying errorlog into dump file.

01:0002:00000:00649:2014/12/02 08:14:51.15 kernel  Dump complete in 299 seconds.

01:0002:00000:00649:2014/12/02 08:14:51.15 kernel  7 segments of total size 33527382016 bytes written to dump file.

01:0002:00000:00649:2014/12/02 08:14:51.15 server  Shared memory dump completed successfully.

3. In the isql session not generated messages of error.

Thanks!

Cristian.

sap_mk
Active Participant
0 Kudos

Any output from instance2's errorlog? Is /backup/mem_dump/ a shared directory between the nodes? If so, since you specified a file name (mem_dump_sleep), it may have been overwritten or prevented from generating on the second node. The file name should be specified as "null" as in my example, so that a generated file name is used.

Mark

Former Member
0 Kudos

Hi Mark,

The Path /backup/mem_dump/ is a local directory in both nodes not is a shared directory. Is necesary that this File System is Shared (nfs) Between both nodes?.

Thanks.

Cristian.

sap_mk
Active Participant
0 Kudos

Hi Cristian,

No need for that, just wanted to check. Any messages in instance2's errorlog that might indicate why it didn't generate a dump file?

Mark

Former Member
0 Kudos

Hi Mark,

In the Error log of  the node 2 not generate any messages. Only the messages that I posted of the node1.

Thanks.

sap_mk
Active Participant
0 Kudos

Hi Cristian,

Okay, please open an incident with Support using component BC-SYB-ASE-CE and attach the errorlogs from both nodes (maybe some earlier messages have clues). They will provide a ftp location for you to upload the memdump.

Mark

sap_mk
Active Participant
0 Kudos

Hi Christian,

Did you ever open an incident? If so, please post the incident number since I didn't see it come through.

Mark