on 03-31-2015 10:11 PM
Hello!
Previously I asked questions on forums.sybase.com, and this is my first post here, so hopefully I am in the right place
I am having a strange situation on a Production environment, that I can't reproduce on Dev environments. Some mobile users are reporting that their synchronization stops working, and the following error appears on Mobilink logs (user-specific information suppressed):
I. 2015-03-31 17:43:18. <15> Request from "UL 16.0.2041" for: remote ID: 3, user name: XXX, version: XXX
I. 2015-03-31 17:43:18. <15> The sync sequence ID in the consolidated database: 8532358e03454d7db35f8c29093b2aad; the remote previous sequence ID: 0c5d767ebaed4f82b60373b992d81d87, and the current sequence ID: 72fbaada560b4f86a6d69b09ff2edfd9
E. 2015-03-31 17:43:18. <15> [-10400] Invalid sync sequence ID for remote ID '3'
I. 2015-03-31 17:43:19. <15> Synchronization failed
As far as I know, this kind of problem would occur only if an old version of the remote database was somehow restored in the device - this is the only way for me to reproduce it. However my Service Desk confirmed that they (or the users themselves) are not messing with the database file in any way.
Is there anything I could do to pin down this problem? What else could let these sequence IDs get out of sync?
Hi Andre,
Do you know what happened with remote ID 3 prior to this synchronization?
The error basically means that the consolidated database and the remote UltraLite database are out of sync - either the remote database was changed via a backup copy being restored (to be 0c5d7...1d87, instead of 85323...2aad) or the consolidated database was changed from a restored backup (from 0c5d7...1d87 to 85323...b2aad).
The other explanation is that it might also happen if you have the same remote ID synchronizing to two MobiLink servers and the remote isn't 'cancelled' on the other server prior to it being seen again - what does your MobiLink infrastructure look like and how many servers are you using?
Regards,
Jeff Albion
SAP Active Global Support
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
What is this mysterious "sequence ID"? Where is it stored, how is it generated, and what is it used for?
There is a consolidated database BINARY ( 16 ) column in ml_database called seq_id but it seems to be NULL. No corresponding column seems to exist in the SYS tables on the remote side.
If there is a tracking mechanism for synchronizations, surely it should be fully documented.
Hello Jeff,
We do have two Mobilink servers behind a load-balancer. I don't have the exact load-balance rules at hand, but looking at the logs of both servers, I don't see the same user syncing at the same time on both servers.
What I do see though, is that before the sequence ID error, the user experiences some network instability and the sync drops, generating the following log:
07:54:49 | Request from "UL 16.0.1823" for: remote ID: 131, user name: XX, version: XX | |
07:54:49 | The current synchronization is using a connection with connection ID 'SPID 107' | |
07:54:49 | The authenticate_parameters script returned 1000 | |
07:54:49 | COMMIT Transaction: Authenticate user | |
07:54:50 | COMMIT Transaction: Begin synchronization | |
07:54:51 | COMMIT Transaction: Upload | |
07:54:52 | COMMIT Transaction: Prepare for download | |
07:54:56 | Sending the download to the remote database | |
07:54:56 | COMMIT Transaction: Download | |
07:55:27 | COMMIT Transaction: End synchronization | |
07:59:01 | [-10279] Connection was dropped due to lack of network activity | |
07:59:01 | Synchronization complete |
Then, right after, the mobile app retry logic kicks in, and another sync is requested, which fails with the following reason:
07:55:04 | Request from "UL 16.0.1823" for: remote ID: 131, user name: XX, version: XX | |
07:55:04 | The current synchronization is using a connection with connection ID 'SPID 95' | |
07:55:04 | [-10002] Consolidated database server or ODBC error: ODBC: [Microsoft][SQL Server Native Client 11.0][SQL Server]Lock request time out period exceeded. (ODBC State = 42000, Native error code = 1222) | |
07:55:04 | [-10002] Consolidated database server or ODBC error: ODBC: [Microsoft][SQL Server Native Client 11.0][SQL Server]The cursor was not declared. (ODBC State = 42000, Native error code = 16945) | |
07:55:04 | [-10343] The remote database identified by remote ID '131' is already synchronizing or the database connection is unusable: unable to access the lock for that remote ID | |
07:59:08 | [-10279] Connection was dropped due to lack of network activity | |
07:59:08 | Synchronization failed |
Thereafter the sequence ID error appears, and this user cannot sync anymore until it deletes the remote database and start again.
Hi Breck,
These are internal UltraLite progress offsets ( similar to SQL Anywhere progress offsets: DocCommentXchange ). In previous versions, the progress offsets were simple integers, but were changed in version 16 to be GUIDs to avoid ambivalence in interpreting the integer when using multiple MobiLink servers.
There is a consolidated database BINARY ( 16 ) column in ml_database called seq_id but it seems to be NULL.
It shouldn't be for an UltraLite remote - here is what I see after a successful synchronization:
>> rid,remote_id,script_ldt,seq_id,seq_uploaded,sync_key,description
1,'cf54a0d9-49de-4ae3-bd9d-4e899305ebee','1900-01-01 00:00:00.000',0xf83327fccaa84c138dacaa249d050981,1,'f32c1ee20c5f4d25881701576c6bf273',
Regards,
Jeff Albion
SAP Active Global Support
It would help to have more details about the inner workings, since the lack of it leaves us having to guess too much.
For example, by the way of empirical testing, I can assert that, in the server-side, the sequence ID is changed at the commit of the upload transaction. So, if the upload succeeds but the download fails, the server sequence ID is incremented.
I also assume that between the upload and download (more precisely, right after the server commits the upload), the server sends the new sequence ID for the client so it also changes the local database.
What I don't know is, what if the server could not contact the client after the upload commit? Would it revert the sequence ID increment in this case? While writing this, I can think of ways I could also test this empirically, but it would save me some time by knowing straight from the devs.
Hi Andre,
Is this over TCP/IP or HTTP? It sounds like there's some state-tracking issue going awry if this is happening over a failed synchronization attempt.
Are you positive these are non-overlapping requests? The times suggest otherwise...
Can you open an incident for this? We would likely need to gather additional network diagnostics to try and figure out what's going wrong in these specific circumstances.
Regards,
Jeff Albion
SAP Active Global Support
Its over HTTP.
Yes, there are overlapping requests due to our mobile app retry logic, which waits a few seconds after a failed sync before trying again. In this particular case, our mobile app is waiting only two seconds before a retry, which may be too short of a time. Do you think avoiding these overlaps could be a solution?
I will ask here internally for the opening of an incident, since I don't have the marketplace login for the company.
Hi Andre,
Do you think avoiding these overlaps could be a solution?
Yes. We have seen issues in previous versions with a similar overlap problem, and our recommendation has always been to "back off" the sync timeout and the application retry logic to something less aggressive, which resolves the problem.
We would still be very interested however to understand the root cause of why it's happening in the first place as there is logic in the MobiLink server to try and avoid this very situation from happening.
Do you know if your load balancer is caching HTTP requests at all?
Regards,
Jeff Albion
SAP Active Global Support
Jeff,
I don't think there is caching, but to be certain I am consulting our Ops.
In the mean time I wondered about the server timeout when there is a loss of connection, since it means our mobile app sync retry logic must wait at least this time before starting another sync. Looking at the logs it seems to be ~4 minutes. Is it configurable? Is there any downside from reducing it?
Hi Andre,
since it means our mobile app sync retry logic must wait at least this time before starting another sync.
Yes, precisely. We would recommend at least the timeout value plus a small "fudge factor" to ensure previous synchronizations are cleared from the system before attempting to synchronize again.
Looking at the logs it seems to be ~4 minutes. Is it configurable?
Yes. See "timeout" in the MobiLink client network procotol options.
Is there any downside from reducing it?
With a lower timeout, keep-alive messages will need to be sent more often and you may miss being able to continue active synchronizations if a temporary network problem occurs. As the documentation notes, we generally don't recommend setting this value below 30 seconds.
Regards,
Jeff Albion
SAP Active Global Support
Hello again.
I have a new development on this subject. Even after performing the changes suggested in this thread, the "sequence ID error" continued to happen. Then we began to suspect that a certain thing in our app could trigger this behavior: the fact that we routinely cancel syncs after starting it. We do it by setting the stop flag described in this documentation. Problem is, the server does not seem to acknowledge this interruption, and keeps waiting for communication from the client for 240 seconds until giving up.
I don't know if we are facing a bug here. I find reasonable that the client should inform the server that the synchronization is being interrupted, instead of letting it timeout by itself.
In our case the client starts a new sync right after canceling the previous one, but the server is still waiting for the timeout, so overlapped syncs occur, which eventually leads to the sequence ID error described here.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Chris.
Our app cancels the sync whenever a certain condition is met (example - the user enters a certain screen), and then when the condition is lifted the sync is started again. Therefore this stop flag can be set at any moment during the synchronization.
Is this enough info or do you need any specific details?
User | Count |
---|---|
87 | |
23 | |
11 | |
9 | |
8 | |
5 | |
5 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.