on 04-02-2015 1:43 PM
Folks;
always in for something new, today I experienced two crashes in a row in our MaxDB installation. Looking at our application log files, this seemed to happen following a certain SQL DELETE statement (always the same, with the same parameters):
SAP AG][LIBSQLOD SO][MaxDB] General error;800 Implicit SERVERDB restart (connection aborted) [S1000]
After that, the database instance needs to be restarted. It reproducibly happens whenever I try to execute this particular query. I do not really have an idea how to track this down. MaxDB is 7.8.02.28 on a Linux x86_64 system. Any comments on that issue greatly appreciated.
Thanks in advance,
Kristian
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Kristian,
The check data shouldn't crash database. Also I checked the logs, there are indeed some indexes
corrupted.
However right now the top priority is to bring database back to online, please check if you are able
to bring database to online. If you are still not able to bring database online and this is your production
system, please raise an SAP very high ticket and I can taker over from my time zone. So you will have 24*7 help. If you are going to raise ticket, please upload the latest .tgz file.
Best regards,
James
Hi James;
at the moment I am trying to bring up our backup instance as the master database doesn't seem to be in good shape. We're not using MaxDB with other SAP software so we can't raise a ticket with SAP for that. 😐 Gonna see whether we get it to work with the backup instance...
Thanks and all the best,
Kristian
Hi Kristian,
Okay, please go ahead.
I load the check data result from maxdb kernel log. The result told us what to do, so first, you have to
check your hardware to make sure the is no hardware corruption or problem before you repair the
data.
Luckily all the corruption detected are indexes, so please open database studio to check whether there is bad index warning in the administion section of database studio.
Then you can easily fix the corrupted index via rebuild.
Or you need to refer to SAP note 839333 at section 14 to find the corrupted index with the informatio
in the attached check data result, then manually fix the corrupted index via rebuilding:
dbcmli command:
sql_recreateindex <schena>.<table>.<index>
Best regards,
James
Hi James;
and thanks for your input. I'll have a look at how things are. The MaxDB instance runs inside a VMWare infrastructure VM, and after restarting the VM I am at least able to get the database instance into Admin mode. Doing a VM snapshot now, just to be sure before moving anywhere further, and am about to try a new full backup of the DB instance after that. Anything else - later...
Cheers and thanks again,
Kristian
... one more question on that: Is there any way to select or delete a corrupt index while the database is in Admin state? For what I see now, we could be out here pretty easily if we could get the database online, but we can't run SQL statements while in admin and so even finding which of the indexes is corrupt doesn't work well. 😐
TIA and all the best,
Kristian
Hi Kristian,
We cannot fixing the bad index in admin state, we have to in online mode.
However if it is really difficult to bring database online, you can try to rebuild the primary node from
standby instance. It is easy, you just need to take a complete offline backup from standby instance, then using restore with initialization option to restore to the primary instance.
Best regards,
James
Hi James;
and thanks a bunch for your hints. Following the help outlined here as well as very good support by the Infolytics people (we're not a direct SAP customer as we do not use any SAP software other than MaxDB), we were able to resume normal operations yesterday and got our system back online with minimal data loss (just a few entries that were created after office hours had to be recreated manually using the application). So far, so good. Now, the database is running on a new VM, updated to the latest .02 build, and error free at least at the moment. Let's see how this moves on.
Again, thanks for your help and all the best!
Kristian
Hi Kristian,
You are welcome. Once the corruption was fixed, then the maxdb crashed sometime should go away. If the same crash issue happens again, please share in this SCN again and attach the latest
trace. For maxdb topic, we have a very good video held by maxdb developers, you can access from link:
SAP MaxDB: The SAP Database - Training
Wish you good luck.
Best regards,
James
So, I managed to run the database check tonight, and even after failing to copy error text out of Database Studio, it seems the things posted to KnlMsg(Archive), please see attached, aren't the way they should be. File check failed? File not accessible? Not good... but which file? Is there any way to track down more in detail what's wrong in these situations? Is there any way to resolve these issues?
TIA and all the best,
Kristian
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi James;
thanks and sorry - here we go. All along the way, I did two checks (one on the whole database, one on just one table) and added a new volume all along the way as the database also is quickly filling these days. Hope you however can get something from that messages files...
Thanks again and all the best,
Kristian
Hi Kristian,
Please execute command below on database host to collect log:
dbmcli -d <dbsid> -u control,<password> diag_pack
Afterwards you will find diagpack.tgz package from maxdb run directory.
Please attach it to this discussion.
Cheers,
James
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Kristian,
You are welcome.
We are using maxdb "7.8.02 BUILD 028-121-245-008", thus the problem is similar a bug pts (1250143). Some bad indexes are accessed.
Did you perform check data regularly? If you did perform check data, please let me know the latest
time you performed it and the result.
Otherwise please perform check data when the system is not having heavy load and let me know the result. To perform the check data, you can use database studio -> check db. Or you can use dbmcli
command as per SAP note 940420.
Cheers,
James
Hi James;
and thanks for your continuing support, greatly appreciated! Well I remember we set a cron job for some things earlier, including statistics updates, but I'll go check whether this includes checking data. Just to make sure: Should I wait for post-office hours to start a data check? I don't want to risk the instance going down in production again...
Cheers & thanks again,
Kristian
Hi Kristian,
You are welcome. Yes, please make sure the check data is performed during post-office hours.
I listed the maxdb core dump, so that all the members who are interested in this topic can see the topic clearly without looking the attached log:
----> Symbolic stack backtrace <----
0: tfind@@GLIBC_2.2.5 + 0xeb70
1: _ZNK11cbd600_Node18bd600EvalSepKeyLenERK24Data_IndexKeyDescriptionRS0_ + 0x69
2: _ZNK11cbd600_Node22bd600BuildSeparatorKeyERS_R23Data_IndexKeyWithLength + 0x187
3: _ZN11cbd500_Tree25bd520_DistributeFromRightER11cbd600_NodeiibR19cbd502_ReorgContext +
4: _ZN11cbd500_Tree15bd520_UnderFlowER19cbd502_ReorgContextb + 0x869
5: _ZN11cbd500_Tree18bd520LeafUnderFlowERK24Data_IndexKeyDescription + 0x1aa
6: _Z19bd400DelFromInvTreeR17cbd300_InvCurrentRK24Data_IndexKeyDescriptionS3_b23tbd00_Del
7: b03del_inv + 0x9a4
9: k61del_select + 0x352
10: _Z17k721result_handleR16SQLMan_MessBlockR18tgg07_select_paramR23tgg00_SelectFieldsPara
11: _Z15kb72select_getsR16SQLMan_MessBlockR10tgg00_LkeyR18tgg07_select_paramR23tgg00_Selec
12: _Z11k720_selectR16SQLMan_MessBlockR18tgg07_select_param + 0x794
13: _Z18k720_single_selectR16SQLMan_MessBlock + 0x65
14: k61del_upd_qual + 0xc8
15: _Z12k05functionsR16SQLMan_MessBlock + 0x802
16: _Z17a06lsend_mess_bufR14SQLMan_ContextR16SQLMan_MessBlockbRsR20SAPDBErr_MessageList +
17: _Z20a06dml_send_mess_bufR14SQLMan_ContextR16SQLMan_MessBlockR26SQLMan_DMLStatementCont
18: _Z16a505most_executeR14SQLMan_ContextR26SQLMan_DMLStatementContextR16tak_changerecordR
19: _Z21a505loop_most_executeR14SQLMan_ContextR26SQLMan_DMLStatementContextR16tak_changere
20: _Z24a501exec_with_change_recR14SQLMan_ContextR26SQLMan_DMLStatementContextR11tak_parsk
21: _Z11a501executeR14SQLMan_ContextR10tak_parsid + 0x29c
22: _Z17a92_mode_analyzerR14SQLMan_Context23tak_ddl_descriptor_Enumb + 0x138b
23: _Z15ak93one_commandR14SQLMan_ContextiRibRb + 0x77b
24: _Z17a93_user_commandsR14SQLMan_ContextRbS1_ + 0x60b
25: _Z7SQLTaskR14SQLMan_ContextP22Kernel_DatabaseContextR13RTETask_ITaskRbb + 0x71
26: _Z11Kernel_MainR13RTETask_ITask + 0x1fa
27: _ZN12RTETask_Task14KernelTaskMainEv + 0x120
28: _ZN17RTEExec_Coroutine14StartCoroutineEPS_j + 0x42c
29: _dl_tls_get_addr_soft@@GLIBC_PRIVATE + 0x419c0
User | Count |
---|---|
87 | |
10 | |
10 | |
10 | |
7 | |
6 | |
6 | |
5 | |
5 | |
4 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.