cancel
Showing results for 
Search instead for 
Did you mean: 

"Bad resume count detected" MaxDB 7.5

Former Member
0 Kudos

RHEL3update4 - 2.4 kernel in VM.

MaxDB 7.5.0.55

SAP Ent. 4.7 (6.40)

For the life of me I can't figure this out.  Installed a system and restored from backup, the SAP system (4.7) runs for almost exactly 11 days and then crashes.  In the knldiag.err file I get the message "T(some task #) Bad Resume Count Detected:138 (usually over 100)"  I've tried upgrading the build to 55 for 7.5, with the same results. Only the DB crashes and not the SAP instance. The stack back trace seems to reference it being a kernel issue but I cannot seem to see what's going on.

I did see an SAP Note about this: http://maxdb.sap.com/webpts?wptsdetail=yes&ErrorType=0&ErrorID=1141905

It seems as though it might have to do with savepoints in the system, but I'm not sure where to begin regarding that. Maybe it's a parameter change I have to make?

We are unable to upgrade MaxDB any further due to lack of support contract and the legacy system of which this is installed. It should be worth noting I have a current system that's got the same CPU/Memory setup and I've tried to equalize the parameters to match but for some reason this *new one seems to crash where the other does not. Lastly, these are all running in VMs.

Help help is GREATLY appreciated (I know the setup is antiquated).

Chris

Accepted Solutions (0)

Answers (2)

Answers (2)

thorsten_zielke
Contributor
0 Kudos

Hi Chris,

hmm, there are two related bug fixes in our "problem tracking database", which are

  1. PTS1142584 (autosave_end suspends wrong task if autosave is running, fixed in 2009 with MaxDB 7.6.06).
  2. PTS1208411 (kernel crash, because resume count of 345 is out of range of the allowed 0-1 value, fixed in 2012 with 7.6.06.24).

But unfortunately for you these fixes were only downported to 7.6 and no correction was done for 7.5.

If you attach the callstack here, I can let you know if your crash indeed matches a known issue e.g. here is the callstack from PTS1208411:

2009-11-10 15:07:22
T71 Bad resume count detected:345

Symbolic Stack Back Trace 0: eo670_CTraceStackOCB
1: vabort
2: en56_StoreResumer_17_3
3: en56_remvresume
4: NextScheduledTask__21RTETask_TaskSchedulerFv
5: vsuspend
6: Suspend__12RTETask_TaskFUi
7: Run__13SrvTasks_TaskFv
8: Kernel_NonUserTask__FR13RTETask_ITaskR13Trans_ContextR20SAPDBErr_MessageList
9: Kernel_Main__FR13RTETask_ITask
10: RTETask_TaskMain
11: en88_CallKernelTaskMain__FP9TASK_TYPE
12: en88_CallCoroutineKernelTaskMain
...

Thorsten

Former Member
0 Kudos

Hi Thorsten,

Thanks so much for this confirmation.  I expected this to be the problem.  The only thing I can't figure out is why it's happening now.  This is a restore of a working production system.  I guess it has to do with my platform which is now a 32-bit VM vs an ia64 physical server. Other than that, the OS, SAP and DB version are the same except for my recent DB upgrade in efforts to fix this issue.

Here is the call stack, If you can offer and more insight as to what might be the cause or possibly a way to fix this without upgrading to 7.6 it would be infinitely appreciated!

T127 Bad resume count detected:128

...

0: 0x0881bae8 eo670_CTraceStack +0x0028

1: 0x0883ba7d vabort +0x0039

2: 0x0884d8dd en56_StoreResumer +0x012d

3: 0x0884cf99 vresume +0x0105

4: 0x084475f1 bd20_ResumeOccupant +0x0041

5: 0x08444eb9 bd20UsePage +0x01e9

6: 0x0846c64e bd13GetNode +0x052a

7: 0x084523b3 b36nqual_from_tree +0x068f

8: 0x08437754 b02kb_select_rec +0x04f4

9: 0x0841f6a7 k75_fetch +0x08f3

10: 0x083b4589 k05functions +0x075d

11: 0x0816f197 a06lsend_mess_buf +0x0493

12: 0x0816ec29 a06ignore_rsend_mess_buf +0x0185

13: 0x08170b65 a06rsend_mess_buf +0x003d

14: 0x0829c2e5 a507select_part +0x0e95

15: 0x0829a4ca a507last_command_part +0x00c2

16: 0x0829225b a505most_execute +0x1313

17: 0x08290a4d a505loop_most_execute +0x0419

18: 0x0828a75a a501exec_with_change_rec +0x0836

19: 0x0828acb0 a501execute +0x031c

20: 0x08334a6f ak92analyze_messagetype +0x05eb

21: 0x083358dd a92_mode_analyzer +0x01cd

22: 0x0833a245 ak93one_command +0x0789

23: 0x08338dfd a93_user_commands +0x0485

24: 0x081556a1 ak91run_user_process +0x0125

25: 0x08155945 a91mainprogam_with_allocator +0x0041

26: 0x0858ba6a gg941CreateAllocatorAndCallMainprog +0x01f6

27: 0x081558e8 a91mainprogram +0x003c

28: 0x08861c50 en88_CallKernelTaskMain__FP9TASK_TYPE +0x01bc

29: 0x08862146 en88_CallCoroutineKernelTaskMain +0x002e

Thank you!

Chris

thorsten_zielke
Contributor
0 Kudos

Chris,

your call stack looks indeed very similar and you are very likely facing the issue described in that PTS1208411.

"Appearance:
Kernel crashes with an error message 'Bad resume count detected'

Preconditions and circumstances:
Error occurs during normal operation

Workaround:
none

Solution:
Error was caused by a counter overrun error. A negative value used in combination with a modulo operation would lead to a negative array index. Now, signed values will be used instead of integers"

When this bug appeared some years ago, we had only seen rare cases and probably none in 7.5, however I cannot predict how often your specific VM system will be affected. At least it is odd that it would appear always at ca. 11 days after the system was recovered from backup.

You might want to test if this issue occurs repeatedly on the new server (without moving your productive system over, please just test), but I am hesitant to call this a feasible solution.

Sadly, there appears to be no workaround besides upgrading to 7.6, 7.7, 7.8 or 7.9, but due to its age and the effort involved there are no current plans for releasing a new version of 7.5.

Could you further clarify on why an upgrade to a newer version is not an option for you?

Thorsten

Former Member
0 Kudos

Hi,

http://maxdb.sap.com/webpts?wptsdetail=yes&ErrorType=0&ErrorID=1141905

MaxDB PTS - Problem Tracking - Error 1141904: Kernel

Known bugs it seems like there is MaxDB patch to apply.

To apply patches for this you could follow:

1020175 - FAQ: SAP MaxDB installation, upgrade or applying a patch.


Sorry I cannot be of more help not a MaxDB expert


Kind Regards,


Johan