cancel
Showing results for 
Search instead for 
Did you mean: 

Error "-9400 AK Cachedirectory full"

Former Member
0 Kudos

Hello,

I'm writing back following an old thread in 2009 on this forum, related to a problem with MaxDB and "AK Cachedirectory full" problems. You can find the previous thread here:

The problem was actually never resolved: we could more or less live with it, and we managed to reduce it for a while, but we are having that problem again almost every day now. We actually fixed various points since 2009 and our system has changed quite a lot.

We use MaxDB 7.8.02 (BUILD 038-121-249-252) with the JDBC Driver sapdbc-7.6.09_000-000-010-635.jar. Note that we don't use MaxDB in a SAP environment as we have our own business application.

Following some very helpful feedback from Lars Breddemann, we fixed various points in our system: for example, result sets were not always properly closed, this is now done immediately after the query has been executed and the result rows were read. We also follow the advise from Elke Zietlow to always close a connection and its associated prepared statements when the error occurs. This also helps in most cases, but sometimes when the error occurs, even closing the connection and its prepared statements does not help and the problem "escalates" until we have to restart the db to fix the problem.

Back to the discussion in 2009, I used the two statements given by Lars to monitor the catalog cache usage: when I run this multiple times, I see that all result sets are properly closed as I only see the ones currently being used and they disappear.

One important point is that our java application keeps many prepared statements open in a cache, to have them ready to be reused. We can have up to 10'000 prepared statements open, with up to 100 jdbc connections. Actually the AK Cachedirectory full problem happens sometimes very soon after we restart our system and db, so at that time the number of prepared statements can be very low, which seems to indicate that the number of prepared statements being open is not necessarily linked to the problem.

Also in the discussion in 2009, Lars mentioned the fact that we use prepared statements of the type TYPE_SCROLL_INSENSITIVE and he was asking if we could not use TYPE_FORWARD_ONLY. Would this really make a difference? We need the TYPE_SCROLL_INSENSITIVE in many cases because we use some iterators to scroll up and down the result sets, so using TYPE_FORWARD_ONLY would require changing quite some code. I also saw in the MaxDB code that using the type TYPE_SCROLL_INSENSITIVE adds the string "FOR REUSE" to the sql statement, what does it exactly mean?

Amy help to fix that problem would be greatly appreciated.

Christophe

Accepted Solutions (0)

Answers (2)

Answers (2)

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

this case is more complicated than expected. We were not able to recreate the bug here (even using your exported catalog schema) and the kernel trace did not give us enough information.

So the only option I see is that we build another kernel with enhanced trace output for you. Should work the same way as before, but write more info into the trace file.

Of course, it would also help us to have a copy of all your data, but I do assume that this something you would rather not do...

Regards,
Thorsten

Former Member
0 Kudos

Hello Thorsten,

Just to check: I'm not sure what catalog version you use (is this exported with each diagpack?), but did it include the constraints on tables PERSON and ACCOUNTTRANSACTION? As I told you in a previous post, we could not reproduce the error when these constraints were missing.

We can of course reproduce the bug with another kernel with more traces if you want to. That is really no problem for us.

Regarding the data, this might indeed not be possible ... but I can check anyway.

Best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

we do have all table and index definitions plus constrains and foreign keys from your database. That is what we had asked for by extracting the database catalog via the loader. This does not include any data from your database, of course...
And, yes, we are aware that the bug is caused by a problem handling the very complex constrains set on your tables. That is also the reason why we have not seen these crashes elsewhere at other customer installations.

I will try to have the new kernel ready as soon as possible...

Best regards,
Thorsten

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

ok, here is a new test kernel with even more debug output in the kernel trace:

https://mdocs.sap.com/mcm/public/v1/open?shr=Bl8zhCTx8eoKDaT2X1fjPklbbz2toL1GCya2f1TWZtw

Same procedure as before, but this time I would like you to reproduce the error several times in different trace files (to check if the error pattern is always the same).

Therefore, proceed as follows:
1. Trigger the error
2. flush the trace "dbmcli ... trace_flush"
3. convert the binary trace on dis to text via "dbmcli ... trace_prot akbxms"
4. rename the resulting <db>.prt trace file to prevent overwriting
... and repeat ca. 3 times

Best regards,
Thorsten

Former Member
0 Kudos

Hello Thorsten,

I just uploaded 3 diagpack archives for the same error reproduced 3 times. It's exactly the same error, each time an UPDATE in the table ACCOUNTTRANSACTION on the same data record.

If you want to, we can also generate another error on another data record or another table (PERSON is a good candidate).

Thanks again for your help,

Christophe

thorsten_zielke
Contributor
0 Kudos


Hello Christophe,

good idea, if its not too much trouble for you, please also try to generate the error on a different table...

Also, is the failing statement still just just a simple update command (but explicitly setting values to all fields in that table) as you had explained before?

Regards,
Thorsten

Former Member
0 Kudos

Hello Thorsten,

Yes the errors are generating each time by an UPDATE command where we explicitly (re)set all the fields of the data record.

I now uploaded 3 more diagpacks, this time I generated the error with the table ORDERMAIN. I managed to exactly reproduce an error we had a few weeks ago when we first upgraded to 7.9. The problem is systematic: I can easily reproduce it, so I generated 3 times the same error. However, during that test, the ORDERMAIN record was first inserted and then updated (and then the error happened). I then deleted the record, re-inserted a new one and re-updated, and the error happened again. I did that 3 times.

Note that the running DB is the same as yesterday (no restart in between).

I hope it helps and you will be able to find the problem.

best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

ok, bug found and  fixed. Please download the latest 'kernel.bin' file (date 25.03.2015) for 7.9.08 (same link as before) and let me know if it helped.

Regards,
Thorsten

Former Member
0 Kudos

Hi Thorsten,

With this kernel, we (Christophe and me) were not able to reproduce the bug any longer.

Does this version contain further logging or is it possible to use it for a more "production like" test with a lot more load?

In the future, I will take care about the testing on our side as Christophe is unfortunately unavailable for this task so he handed it over to me.

best regards,

Daniela

thorsten_zielke
Contributor
0 Kudos

Hi Daniela,

yes, you can use this kernel for a 'production like' test. It includes only this bug fix and additional trace waiting in case of this specific error. This has no performance impact.
Just keep in mind that you have a very unique database scenario and although I do not expect any further errors here, there still is a chance for some yet unknown bug.

Best regards,
Thorsten

Former Member
0 Kudos

Hello Thorsten,

My colleague Daniela is in holidays this week: she will probably schedule a "real" test when she comes back, that is a test with more users and load. Let's hope that this time the bug is fixed.

I will no longer be involved but I wanted to thank you (and your colleagues) once again for your excellent support.

Best regards,

Christophe

simon_matter
Participant
0 Kudos

Hi Thorsten,

I'm coming back to this issue again.

First I was looking for a new release and found that indeed one is available: 7.9.08.30

My question: is the bug we discussed here fixed in 7.9.08.30?

BTW, there is only the 64bit version available but 32bit is missing in the download.

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

sorry for not mentioning the bug tracking number, it is PTS 1253544:
"fix for move error during update on table with foreign key condition, if table is both referencing and referenced in a foreign key definition"

The fix is available since MaxDB 7.9.08.31. Unfortunately, the SCN releases have not yet been updated to our current version 7.9.08.32. I have triggered the process now, but it may take until the end of next week until the new versions are available.

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

I've checked the storefont for 7.9.08.32 yesterday just to find out that MaxDB for Linux has vanished there, only dbstudio is still available. Do you know why?

Thanks,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

the new MaxDB versions should be available soon - we have delievered them to the SAP store team last week.
For reasons unknown to me the process of making these binaries available always takes more time than expected...
If its not in the store by the end of this week, remind me to follow up on this...

Regards,
Thorsten

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

just to let you know: MaxDB 7.9.08.32 should be available for download now - however, I have checked only the Windows package...

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Thanks Thorsten, the only thing left is that Linux 32bit version is still missing

Thanks,

Simon

thorsten_zielke
Contributor
0 Kudos

Ups, 32 bit... I have to say that we have discontinued supporting the 32 bit versions of 7.9.08 and 7.8.02.
I can still send you a download link for 7.9.08.32 32 bit (later tody or by tomorrow), but this may be the last MaxDB 32 bit kernel I can offer.

If anyhow possible, I would strongly recommend moving to a 64 bit OS platform as especially with databases it makes sense to run them in 64 bit to allow for bigger datacache sizes.

Just to be clear - we have only ended delivering new versions for 32 bit operating systems. If your OS and processor allows (which it should unless it is 10+ (?) years old), you can simply upgrade your MaxDB software to MaxDB 64 bit.

Regards,
Thorsten

simon_matter
Participant
0 Kudos

ok no problem, that's only a test vm I have around here which is still 32bit, the real stuff is all 64bits of course.

Thanks,

Simon

thorsten_zielke
Contributor
0 Kudos

Here is the download link for Linux 32 bit for MaxDB 7.9.08.32:

https://mdocs.sap.com/mcm/public/v1/open?shr=i41fmLy164kRb2ekoKxJzANXE9z7cvcoxT0PszUrj7k

glad to hear that this 32 bit VM is for testing only - but for future MaxDB tests, please use a 64 bit VM instead... 🙂

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

Finally I'd like to give some feedback: we have upgraded our main DB to 7.9.08.32. Unfortunately it didn't work better than on 7.8 because it would just segfault from time to time. After painful hours we migrated back to 7.8.02.38 and things are stable again.

Unfortunately we couldn't reproduce the crashes, they just happened sporadically. We also have no test environment to investigate further.

The only thing I was able to get is the data from /var/opt/sdb/globaldata/wrk/<$INSTANCE>/DIAGHISTORY/ which I have put in a tarball. The bzip2 compressed archive is ~1Gb in size and I'm wondering if it would help you to find out what's going wrong here?

The archive contains the following files:

-rw-rw---- 1  200  200 557056 17. Sep 16:59 AK00001.dmp
-rw-rw---- 1  200  200  16384 17. Sep 16:59 AK00001.stm

-rw-rw---- 1  200  200 6403178496 17. Sep 17:00 knldump

-rw-rw---- 1  200  200   10959335 17. Sep 17:03 KnlMsg

-rw-rw---- 1  200  200   28753920 17. Sep 17:03 knltrace

-rw-rw---- 1  200  200 728279 17. Sep 16:59 rtedump
-rw-rw---- 1  200  200  12288 17. Sep 17:03 RTEMemory_Chunk.0000000800000000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800003000
-rw-rw---- 1  200  200   4096 17. Sep 17:03 RTEMemory_Chunk.0000000800103000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800104000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800204000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800304000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800404000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800504000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800604000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800704000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800804000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800904000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800a04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800b04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800c04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800d04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800e04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000800f04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801004000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801104000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801204000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801304000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801404000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801504000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801604000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801704000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801804000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801904000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801a04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801b04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801c04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801d04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801e04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000801f04000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000802004000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000802104000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000802204000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000802304000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000802404000
-rw-rw---- 1  200  2001048576 17. Sep 17:03 RTEMemory_Chunk.0000000802504000
-rw-rw---- 1  200  200      0 17. Sep 17:03 RTEMemory_Table.00000008000fd080

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hello Simon,

the KnlMsgArchive file would have been very helpful to check if all the errors were showing a similar call stack...
Still, let me have a look at the KnlMsg file - please use the following link for upload:

https://mdocs.sap.com/mcm/public/v1/open?shr=6niQeUqECJbmUJI0HPPsifIFQwBUHRqrjjUYPRGgYCg

Kind regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

I have uploaded KnlMsg. Unfortunately KnlMsgArchive is not available anymore because it doesn't get copied to the diaghistory directory at the time of the crash.

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

can you also upload the two 'AK*" files and the 'RTEMemory*' files?

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

I've packed all in a tarball except the knldump file. Hope it helps.

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

the callstack in the KnlMsg file indicates that we have a problem with the 'UseAlterTableAddColumnOptimization'. This performance optimization is enabled in newer version of MaxDB 7.9 and we are not aware of any crashes in this area.

To trigger this abort, the following conditions have to be met:
1. alter table add colmn (adds a column to an existing table)
2. update on that table

We would very much like to further investigate this crash (especially since we have set this parameter to 'YES' as new default value), so it is rather unfortunate that you did not have that system available any more. How did you move back to 7.8? Is there any chance that we can find such a command (alter table add & update) in the SYSDDLHISTORY? Are you aware of a situation in which your application needed to add columns to existing tables recently?

I believe that you could have stayed on MaxDB 7.9 by simply changing that parameter value from 'YES' to 'NO' (unless there were crashes with other symptoms I have not seen yet, of course). And with 7.9 you would still benefit from the bug fix you needed 7.9 in the first place...

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

Thanks for your info, it's much appreciated.

Thisis what we found out:

We had an "alter table add colmn" on that day but it was around 9:06 and the first crash occurred at 11:28. We had four more crashes which occurred around 1h after each other.

Can it be that those crashes occur in that way?

Fortunately I found all the KnlMsg from that day in our backup and I can confirm that they all contain the same "Symbolic stack backtrace", indicating it was always the same crash from what I understand.

Unfortunately I've been told that we can not do more live tests with our main instance to diagnose things, but we would like to try to reproduce this with one of our less important instances. They already run 7.9 and we can try things there.

What do you suggest to try there? Doing "alter table add col" on a large table and then updates immediately while the db is loaded with other tasks? I guess there must be a certain condition met for the crash to happen.

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

how to trigger that bug is excatly the problem - since you told me that your database crashed only ca. 2,5 hours after a table was modified by the 'alter table add column command' there is no need to issue an update command right after a table column was added. Also, it seems rather unlikely that database load is important for this bug to occur, unless on that very same table...

Do you frequently use the 'alter table add column' command in your application? Probably not or do you have a scenario where you add table columns frequently?

regardsm
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

From looking at SYSDDLHISTORY I can confirm that we do 'alter table add column' quite often, almost daily.

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

ok and I have to add that in contrast to my previous statement about the parameter UseAlterTableAddColumnOptimization preventing this crash in MaxDB version 7.9, this might not true, because we just discovered that some part of the new coding is even executed with that optimization disabled. So I would be careful with upgrading your productive system again to 7.9 as long as we have not discovered and fixed that bug. Sorry 😞

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

Were you able to reproduce the problem somehow?

Please let us know if there is any news on this issue.

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

no, we could not reproduce the problem. It might help, if you could upload your 'sysddlhistory' content, maybe we see some pattern which gives as a hint on how that error was possibly triggered.

Regards,

Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

I've sent you the dump as SYSDDLHISTORY.txt.gz. Please note that it doesn't include the day where we tested 7.9 but there was nothing different than other days.

Thanks,

Simon

simon_matter
Participant
0 Kudos

Hi Thorsten,

Will there be a new 7.8 release where at least our former bug is fixed? It would already help a bit.

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

since the 'alter table add column' optimization is a important feature/requirement for MaxDB 7.9, we would be very interested in locating and eliminating that bug, but we might need your help.

You had mentioned that at crash day around 9:06 an alter table add column was performed. Can you let me know which table was involved? This would help going through the sysddlhistory, although of course we do not know if that 'alter table' command was resposible for the crash or maybe another older one.

In the next days, we will be going through our code and check for possible errors with that feature and if we do not find something, we will have to use another approach.

It would be really helpful, if you could trigger that bug in your test environment, but if that is not happening, we will try to find another solution...

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

I'm afraid the sysddlhistory will be of no use for you because the day of the crashes with the 7.9 version is not in there. The history is only from the working 7.8 version and the 7.9 instance doesn't exist anymore

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

yes, I know. But I thought that the "alter table / drop table / add column..." statements were a regular feature of your application (which seems a bit unusual to frequently add columns and then later delete them, but I thought that was how your application was designed...).
In that case it would make sense to have the table table name which crashed the database of 7.9, because very similar statements should also be in the sysddhistory for 7.8.
But maybe my assumptions were wrong...

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

I know there are some create/drop table daily but not with alter table from what I know. And I'm sure the alter table.. on that day was a unique command done by one of the developers. So, nothing related or a repeating pattern would be found in the history.

Regards,

Simon

Former Member
0 Kudos

Hi Thorsten,

As one of the developers of this system, I can give some more information.

First, as an introduction:We don't have software releases in the classical way: We usually have a new version every day. Sometimes, we have two versions running in parallel if new features are needed urgently. We neither have a test database, meaning, we work and change on the live database!

We have two different ways to change the database scheme:

What Simon mentioned, we add temporary tables, insert data, do something with the data, and in the end, we drop those tables again. Those tables are never changed, just created, used and deleted again.

We as developers (currently 7) do changes. We only do some changes during the day:

- Add tables, indexes, foreign keys, constraints and triggers

- Change columns (defaults, nullable...)

- Change constraints and triggers

- Drop indexes, foreign keys, constraints and triggers

- What we rarely and only under certain circumstances do:

  - Drop columns

    - column is brand new and was not yet used (mistake during creation usually)

    - Table will not be used for the day until we have the new software version. Our framework can't handle deleted columns at runtime

  - Add columns to table where columns have been previously deleted and the datatype allows reuse of that deleted space, again, our framework has problem handling that case.

Deleting columns is done in the evening when no users are present. We restart our system after that but usually not the database server.

Regards,

  Daniela

thorsten_zielke
Contributor
0 Kudos

Hello Daniela and Simon,

thank you for the detailed reply.
While we are trying to trigger this bug with in house testing, these questions came up:
Do you know if the 'alter table column add' command was executed with or without the 'default' option in 7.9?
Do you know or suspect on which table the alter command was issued which later crashed the database?

Regards,
Thorsten

Former Member
0 Kudos

Hi Thorsten

As we had to replay that change on the 7.8 database, we know what it was:

ALTER TABLE BUILDING_UNIT_CONTRACT ADD EXTEND_CONTRACT_DATE DATE

According to our auditing, there was no update or insert to that table after the 16 of September and there is currently no record with a non null value for this column. Our auditing for the 17 is not 100% reliable because of the migration back to 7.8 so it would be possible that it has been set and reverted back to null at this day.

Regards

  Daniela

thorsten_zielke
Contributor
0 Kudos

Hello Daniela & Simon,

good news, we have finally found the bug - this should be fixed with MaxDB 7.9.08.35...

Regards,
Thorsten

simon_matter
Participant
0 Kudos

Hi Thorsten,

That's really good news, thank a lot for your efforts!

Is MaxDB 7.9.08.35 already available or is there a planned release date?

Regards,

Simon

thorsten_zielke
Contributor
0 Kudos

Hi Simon,

planned release date is in about 4 to 6 weeks - I do not even have a PTS bug trackiing ID yet, I just wanted to let you know that we have located the bug as soon as I got the news...

Regards,
Thorsten

Former Member
0 Kudos

Hi again,

More useful information: we had today again a few db crashes and they all seemed to be related to one specific table where we could insert data, but trying to update any row failed with an "AK Cachedirectory full" error. In an attempt to find a solution, we temporarily disabled one "update" db trigger on that table, and this has immediately solved the problem. We reactivated the trigger to see if this was not a coincidence, and the problem immediately reappeared (although this time we got an "[-9206]: System error: AK Duplicate catalog information" error which quickly lead to a "Restart required" error). De-activating the trigger again solved the problem. Note that this trigger usually works without any problem, so it's also unclear why it suddenly lead to that problem.

Actually we also noticed in the past that adding or removing a constraint on that particular table could lead to suddenly having some "AK" errors. Also adding or dropping columns, or adding/removing foreign keys seemed to sometimes make that table "unstable".

Maybe that's a silly question, but is there a way we could check if the internal "structure" of a table is somehow corrupted? For example we also get a strange error that when a constraint is violated, the exception usually reports the wrong constraint, having an offset of 1 with the real constraint (I don't remember if it reports the previous or next constraint instead of the right one).

Any idea or hint about a possible problem?

Thanks,

Christophe

david_liu1
Advisor
Advisor
0 Kudos

Hello,

You can check SAP note 1334850 and get more information.

A second work around option which you can try is to set the

USEVARIABLEINPUT = YES (SAP Note No. 1001257).

Regards,

David

Former Member
0 Kudos

Hello David,

Thanks a lot for your answer: I unfortunately don't have access to the support notes, we are using MaxDB in a non SAP environment and SAP support notes require a login. Is it possible for you to please send me the content by email?

Thanks in advance and best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

I have read your old thread from 2009. Ironically, increasing the database parameter CAT_CACHE_SUPPLY e.g. to 900000 and then restart the database might indeed help in your specific case (as was recommended to you in the beginning of the old thread by Anish John back in 2009).

Let me suggest that you create an archive with the error log files by running 'dbmcli -d ... -u ... diag_pack' and upload it  (let me know, if you want me to create an upload link for you...). I would then search the KnlMsg file - the Catalog Cache automatically grows until max size has been reached and the Catalog Cache growing should be printed in the KnlMsg file (but not as error, so not part of the KnlMsgErr file...). If I see it reaching its maximum, I would strongly suggest to increase the CAT_CACHE_SUPPLY.

Kind regards,
Thorsten

Former Member
0 Kudos

Hello Thorsten,

Thank you a lot for your answer. We actually increased the CAT_CACHE_SUPPLY yesterday to 1000000 and also modified our application a few days ago such that java prepared statements are closed as soon as possible after being used (which was not always the case). Both changes seem to have had a positive effect, we are having much less AK Cachedirectory errors.

However I noticed that after a DB restart, if the load is immediately very high (this is the case when we have a crash during a working day), we sometimes get quite many AK errors very quickly, probably because the size of the cache is increased each time it almost gets full and it is initially far too small after a restart and needs to be "increased" many times.

Hence the question: is it possible to specify a minimum size for the cache to make sure that after a restart, it's big enough to "sustain" the sudden load? How can I also exactly query the size of the catalog cache? There is the table COMMANDCACHESTATISTICS but this the SharedSQL cache, right?

How can I also reliably check how many active connections the DB has? I checked the number of sessions in the SESSIONS table, but this seems to be lower than the number of JDBC connections currently active in our application. We actually increased the paramater MaxUsers from 150 to 300 as we had some "connection refused" errors, although our application had around 100 open JDBC connections (select count(*) from SESSIONS was about 70), so much less than 150. It's a bit too early to say, but it seems to have also solved the problem. It's not clear too me why we had connection problems.

Finally regarding the generation of the archive, I will check that this week-end as I don't want to generate it while all my collegues are working.

Thanks a lot for your time, we really appreciate.

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

changing the minimum size would not have any influence here, but maybe you should increase the CAT_CACHE_SUPPLY size even more. The size is given in bytes, so 1000000 bytes are just a bit under 1 MB size across all sessions. This size adds to the database memory heap, but if needed I would suggest to raise it considerably, maybe even to 100 MB or more...

To find out how much CAT_CACHE_SUPPLY is used by a task, just do a "select * from SESSIONS" and look at CatalogCacheUsed (displayed value is given in KB!). This value is like a 'high water mark', because the catalog cache per session never declines, it only grows up to its maximum allowed size. But this is good - just let your application run for a while and then check the cache supply per session (of course, the values are reset after a database restart).

From the database point of view, the maximum number of active connections is indeed shown in the SESSIONS table. If you see a higher number in your JDBC application, then maybe this is a result of JDBC connection pooling enabled.
In addition you could also run a 'x_cons <dbname> show active' to get a list of all active database sessions.

Kind regards,
Thorsten

PS:
the 'diag_pack' command is not known to create any performance issue on database level - it only converts some pseudo xml text files to plain text and creates a package of database error log files, which might be of interest for the error diagnosis. But of course, no need to do this right away 🙂
Let me know how it turns out with the CatalogCachedUsed and a possibly increased CAT_CACHE_SUPPLY...

Former Member
0 Kudos

Hello Thorsten,

Thanks again for your answer. I might be wrong, but I always thought (from the maxdb documentation) that the value of CAT_CACHE_SUPPLY was in 8KB pages, so 1'000'000 would mean 8GB which is probably very large now.

I now generated the log archives, and would be glad if you could provide an upload link (we would prefer not to publish these files here on the forum). However I don't see any info regarding the growth of the catalog cache (I saw that for the data cache), maybe we have to activate some debug flag for that?

Thanks again for your help,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

indeed, CAT_CACHE_SUPPLY is given in 8 KB pages - sorry for the confusion, no idea what made me think it was bytes...

Here is a link for uploading the diagpkg file:
https://mdocs.sap.com/mcm/public/v1/open?shr=2wTgQrXdl2NNZUV5LjwfTIBdW0kssK3O7Rx8R2675eI

What is your current setting for CAT_CACHE_SUPPLY? You mentioned trying 1.000.000 (which would be almost 8 GB) - did it still give you errors with the nearly 8 GB cache?

Thorsten

Former Member
0 Kudos

Hi Thorsten,

Yes we still get AK errors even with 1.000.000 (8GB), and we even just got a DB crash today at about 13:29. I uploaded 2 diagpkg files: the one called diagpkg.tgz was generated at approx. 10:30, and the other one at approx. 14:00 after the crash. You can also see that very shortly after the DB restart at 13:30, we got an AK error at 13:37.

I'm still not sure that the log files contain growth debug info for the catalog cache, can you please check and let me know if we should activate some debug flag for that? Thx.

Thanks again

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christoph,

you are correct, the KnlMsg.txt file does not contain any growth information for the Catalog Cache - this likely means that the Catalog Cache does not need to grow and the database aborts due to a corrupted catalog Cache structure (the result of a so far unknown MaxDB bug).

Unfortunately this kind of error is hard to find as we need to catch the statement/action which corrupts the Catalog Cache. When the database finally detects the error by accessing the corrupt structure, the damage might have already been in the Cache for a while.

The problem of locating the bug:
1. Technically we would need to have the database run in debug 'slow kernel' mode, which will slow down database performance drastically.
2. Also, the last crash occured after ca. 6 hours of the database being online - this would generate a lot of log data via the 'slow kernel' and probably too much to reasonably analyse.

How to proceed:
Can you try to set up a small testcase where the database preferredly crashes very soon (even better would be to identify the offending SQL statement)?
Maybe you could set up a new database for testing and then try to force the error as soon as possible after database start (or at least wthout much other activity around)? A small Catalog Cache size might help in forcing the issue, but then maybe it does not have any impact at all.

Thorsten

Former Member
0 Kudos

Hello Thorsten,

Actually we identified at least 2 statements that lead to the "AK Cachedirectory full" error, and one causes approximately 90% of the errors. It's a simple UPDATE statement, however it's a bit of a "rough" statement because we (re)SET all the columns, even the columns that are not necessarily updated. This is "standard" in our application, and we do that for a few hundreds tables, but somehow the AK error mainly occurs for just one table. This table is not huge, it has around 88 columns and 800K lines, 31 constraints, 17 indexes, and a few dozen foreign keys.

Regarding your proposal to setup a test DB and reproduce the error, I doubt a bit that it would work because the error does not happen systematically. I will however try to "generate" the error on our productive system in debug 'slow kernel' mode during a night or Sunday when our users are not working.

Actually when the AK error occurs, the DB dumps some .stm and .dmp files, all called AKxxxxx.stm or AKxxxxx.dmp (there seems to be one file for each active session). Is there a way to analyse these dump files and maybe find out what the problem exactly was?

By the way and also interesting, we managed to solve the problem we had with DB connections. This had nothing to do with the MaxUsers parameter: we actually had to increase the maximum allowed number of semaphores array on our Linux system. We managed to identify this error via the xserver log file, in which we saw the error:

ERR  11277 IPC  create_sem: semget error, No space left on device

increasing the number of maximum semaphores array from 128 to 256 solved the problem. Maybe this is a silly question, but could it be that another system limit causes the bug because the DB somehow cannot get enough resources for some task?

We also see in the xserver log file some eeror messages like

ERR -11987 COMMUNIC session re-used, command timeout?

Is this error "normal/standard", or can it be that there is also something wrong here?

Thanks again for your help,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

you can ignore the error '-11987, session re-used...', this should be a warning rather than an error, but nothing to worry about...

For the OS 'semmni' parameter I would recommend it to increase even further to at least 9000 to avoid any 'semget error, No space left on device' messages in the future and since I am not aware of any negative side effects for high semmni values. Further details are described in SAP note 628131, if you want I can attach more details...

Regarding the AK...stm and AK...dmp files, yes - please uload them - it might help.

If you start your system bwith the slowkernel, please activate the TraceCatalogCache trace option with level 7 and send us the trace after the next crash.

Reagrds,
Thorsten

Former Member
0 Kudos

Hello Thorsten and thanks again for your answer.

I just uploaded a .tgz archive of all the AK dump files (via the upload link you gave me last time): maybe you can see something useful?

Regarding SAP note 628131, I'd be glad if you could attach more details or just send me the content by email (if that's possible): we unfortunately don't have access to these notes.

For the "slow mode" debug, I'll probably do that in a few days when I can find a slot when nobody works.

Best regards and many thx for your time,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

I have uploaded a copy of note 628131 into diagpack directory and will look into the AK dump files tomorrow, if time allows.

Thorsten

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

some update for you:

1. The crash might be related the constraint 'year_id' - can you let me know how that constraint is defined?

2. Or even better, can you do a database catalog extract and upload it? This extract would write the complete database object definitions (but without actual content) to a text file.
To do so, connect via 'loadercli' as "loadercli -d <dbname> -u dbuser,pwd" and then start the extract with "catalogextract user outstream 'yourfilename'".

3. When you try to reproduce the issue using the slow kernel, please also enable the parameter CheckTaskSpecificCatalogCache by setting it to 'YES' before you start the slow kernel (and set the trace level for CheckCatalogCache to 7). This parameter change should abort (=crash) the database as soon as it detects any inconsistency (as opposed to now when the database throws the AK cachedirectory full error for a while and eventually aborts when the cache is sufficiently corrupt...).

4. I will be out of office until January 5.

Merry Christmas & Happy New Year,
Thorsten

Former Member
0 Kudos

Hi Thorsten,

I just uploaded the catalog extract. We actually have 33 constraints with the name YEAR_ID, so we need the table name to find out which one caused the problem (it might actually be the ones from tables PERSON and PERSON_LOG, because we sometimes get AK errors with these 2 tables). FYI, we actually almost get all AK errors with the table ACCOUNTTRANSACTION.

Actually we recently copied the entire table ACCOUNTTRANSACTION into a new table, dropped the old table, and renamed the new one to ACCOUNTTRANSACTION. We then reinserted all foreign keys and constraints ... but it did not help. 😞

By the way, did you check the file KnlMsgArchive in the diagpack files I uploaded? There you can see a stacktrace of the error when the DB crash occurs, it's a SIGSEGV: maybe you find out something useful there?

Thanks again for your support,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hi Christophe,

KnlMsg does not help much here, because the database eventually aborts when it detects the corrupt catalog cache structure, but what we need to find out is during which operation the cache gets corrupted in the first place. Therefore the request to start the database in slow kernel mode and with CheckTaskSpecificCatalogCache to have the database kernel detect the corruptions sooner in the hope that we can still see the statement damaging the cache (probably some statement writes into a memory area it is not supposed to write...).

Thorsten

Former Member
0 Kudos

Hello Thorsten,

Let me first wish you all the best for 2015!

I wanted to run our DB in "slow mode" yesterday but then realized that the slow kernel is not part of the "community edition". When I run "db_restart -slow" in dbmcli, I get the following error message:

ERR

-24994,ERR_RTE: Runtime environment error

20095,kernel program missing '/opt/sdb/MaxDB/pgm/slowknl'

I just downloaded the .tgz again (64 bits, I tried 7.8.02.39 and 7.9.08.27) and could not find the file 'slowknl'.

Is there any way we can download the slow kernel (we actually currently use version 7.8.02.38), or shall I just run our DB with with the higher debug levels? Would this help?

Thanks again and best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

ups, seems that we do not deliver the slowknl with the current Community Edition packages at the moment - no idea why, I do hope we can change this in the future.

I have uploaded a 'slowknl' file to the usual link, version is 7.8.02.38 for Linux x86_64, please download to the '.../pgm' directory (as indicated in your previous error message) and ensure to have the same file permissions and ownership set as for the regular 'kernel' file (also located in that dir).

Kind regards and hope it works now,
Thorsten

Former Member
0 Kudos

Hello Thorsten,

Thanks for the 'slowknl' file: I now downloaded it. I don't know when I will again have a good slot to restart the DB in slow mode as I have to do it when no user is working.

Actually we have not had any DB crash for almost 10 days now: I will now analyze what changes we did just before the last crashes to maybe find out if these could have been the "solution" to our problem, or if somehow this is just a coincidence. We changed many DB parameters, semaphore server config, the way we handle jdbc connections and prepared statements in our application, etc. I will let you know if I find out something interesting.

Best regards,

Christophe

Former Member
0 Kudos

Hello Thorsten,

This afternoon we started to get AK errors again, so I decided to do the slow mode test this evening. I "blocked" our business application and restarted the DB in slow mode.

I then "unblocked" our business application and updated a row from the table ACCOUNTTRANSACTION (this is the table from which we almost always get "AK Cachedirectory full" errors). The DB immediately crashed and I generated the diagpkg file diagpkg-20140106.tgz which I uploaded to the file shared you provided. I hope you can find some useful information there.

Note that when I unblocked the application, it can be that some processes immediately accessed the DB because there are some background threads that I cannot easily control. However the crash exactly happened when I then updated the row (via the business application) so I think that the cause is pretty clear.

Thanks again for your help, let's hope you can find that nasty problem. 😉

Best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

thank you for the diagpack archive. While we are analyzing it, please also upload the corresponding AK dump files (as previously done) - just navigate to the directory where MaxDB stores the dumped files sorted by crash time...

Best regards,
Thorsten

Former Member
0 Kudos

Hi and thanks again for your answer.

I now uploaded the AK dump files: there were only 2 sessions (= JDBC connections) because I was actually the only user connected and there is a global session for the system itself.

Best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hi,

with your help we were able to recreate the 'catalog cache' crash here in our lab. This is good news, because now I am confident that we can find and fix this bug.

I will keep you informed and let you know once I have further news...

Thorsten

Former Member
0 Kudos

Hello,

Whaooo that sounds great ... we are *really* looking forward to getting more info, that's really good news. I'm impressed that you guys manged to reproduce the bug, you really know what you're doing. 😉

Thanks again for your time, this is great support.

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello,

the bugfix is planned for MaxDB 7.8.02.42 (not yet available) and 7.9.08.28 (not yet available).

As we have just released MaxDB 7.8.02.41, that fix will probably be first released for MaxDB 7.9.

Unfortunately the only "workaround" I can think of would be to reduce the amount of constraints on that table and/or reduce the conditions within the constraints to make them less complex, probably by moving that logic into the application. I will keep you informed on when we will have a fix available (or remind me, if i do forget), as mentioned above, we will likely have 7.9.08.28 first (in case upgrading to 7.9 would be an option for you...).

Thorsten

Former Member
0 Kudos

Hi Thorsten,

Thanks for your answer, this is great news. Do you have any clue when 7.8.02.42 might be released, or is there any possibility to get a patched-binary so that we can "survive" until 7.8.02.42 is released? We would be glad to beta-test that version and give you feedback to let you know if it fixes our problem.

By the way now that you're talking about constraints, maybe another minor problem that we have is also related: sometimes when we get a constraint violation exception, the JDBC driver reports the wrong constraint name: there is usually an offset of 1, that means it reports the previous or next constraint from that table (I don't remember if it's the previous or next). Do you think that this is related?

Finally, there is also another minor problem that we get almost every day, not sure if this is related. We sometimes get a JDBC exception when we read a lot of rows (one at a time) from any table. The exception is

java.lang.ArrayIndexOutOfBoundsException: 41

        at com.sap.dbtech.util.StructuredBytes.getInt2(StructuredBytes.java:168) ~[sap-jdbc-7.6.09.02.jar:na]

        at com.sap.dbtech.jdbc.packet.ReplyPacket.clearPartCache(ReplyPacket.java:778) ~[sap-jdbc-7.6.09.02.jar:na]

        at com.sap.dbtech.jdbc.packet.ReplyPacket.<init>(ReplyPacket.java:52) ~[sap-jdbc-7.6.09.02.jar:na]

        at com.sap.dbtech.jdbc.packet.ReplyPacketFactory.getReplyPacket(ReplyPacketFactory.java:28) ~[sap-jdbc-7.6.09.02.jar:na]

        at com.sap.dbtech.jdbc.ConnectionSapDB.execute(ConnectionSapDB.java:653) ~[sap-jdbc-7.6.09.02.jar:7.6.09    Build 000-000-010-635]

        at com.sap.dbtech.util.GarbageCan.emptyCan(GarbageCan.java:70) ~[sap-jdbc-7.6.09.02.jar:na]

        at com.sap.dbtech.jdbc.ConnectionSapDB.execute(ConnectionSapDB.java:687) ~[sap-jdbc-7.6.09.02.jar:7.6.09    Build 000-000-010-635]

        at com.sap.dbtech.jdbc.ConnectionSapDB.execute(ConnectionSapDB.java:565) ~[sap-jdbc-7.6.09.02.jar:7.6.09    Build 000-000-010-635]

        at com.sap.dbtech.jdbc.CallableStatementSapDB.execute(CallableStatementSapDB.java:454) ~[sap-jdbc-7.6.09.02.jar:7.6.09    Build 000-000-010-635]

        at com.sap.dbtech.jdbc.CallableStatementSapDB.execute(CallableStatementSapDB.java:319) ~[sap-jdbc-7.6.09.02.jar:7.6.09    Build 000-000-010-635]

        at com.sap.dbtech.jdbc.CallableStatementSapDB.executeQuery(CallableStatementSapDB.java:763) ~[sap-jdbc-7.6.09.02.jar:7.6.09    Build 000-000-010-635]

        at com.sap.dbtech.jdbc.trace.PreparedStatement.executeQuery(PreparedStatement.java:161) ~[sap-jdbc-7.6.09.02.jar:na]

It's always that ArrayIndexOutOfBoundsException: 41 ... it's not too bad because it just happens from time to time, but maybe interesting for you to know about that.

Thanks again for your great support, and please let me know about 7.8.02.42.

Best regards,

Christophe

Former Member
0 Kudos

Hello Thorsten,

Do you have any schedule regarding the availability of 7.8.02.42? (we cannot migrate to 7.9.x because this is not supported by our current Linux db server).

In case it's possible to already get a patched-binary, we are of course volunteer to test the bug fix. 😉

Thanks again for your support,

Christophe

thorsten_zielke
Contributor
0 Kudos

Christophe,

MaxDB 7.8.02.42 might take a while since we have just released 7.8.02.41 and the release cycle is not as frequent as with version 7.9.
Speaking of 7.9, apart from 32 bit MaxDB I think 7.8 and 7.9 are released for the same Linux OS levels, so maybe you can check again, if you could upgrade to 7.9.
Unfortunately we do not have the capacity to build additional hotfixes at the moment - sorry.

Next 7.8 release will probably be in about 3 to 6 months, MaxDB 7.9 likely within February 2015.

Thorsten

Former Member
0 Kudos

Hello Thorsten,

Thanks again for your answer. We of course understand the issue with the hotfixes, thanks anyway for having checked. We will wait for the next release and will maybe migrate to 7.9 (I'm currently checking with our system admin). I leave the forum thread open and will mark it as resolved when we will get the new version and everything works fine.

I would like to thank you once again for your amazing support: if you once come to Switzerland, let me know and we will invite you to discover the swiss gastronomy (with a heavyweight cheese fondue). 😉

Best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

here is a link to our latest package 7.9.08.28 for Linux x86_64:

https://mdocs.sap.com/mcm/public/v1/open?shr=qLVB9Iq1L6HaK_1YLa4C4SqYINyQNg68AkLVe2S6n4M

We initially wanted to make it available the usual way for general download, but since the internal processes for upload take some time (don't ask me why...) and we will be releasing 7.9.08.29 soon, we decided to skip 7.9.08.28 here in the Community network.

However, this patch will be available for our regular SAP MaxDB customers via download and there are no serious showstoppers known so far, so feel free to download it to finally have your bug fixed.
Of, course, anyone else here in the Community forum may download this patch as well (link should be valid for 21 days...).

Best regards,
Thorsten

Former Member
0 Kudos

Hello Thorsten,

Thanks a lot for the download link. Since we're still using a 7.8 version of the DB, our system admin will install a test server and test the migration of the data to make sure that everything works fine with 7.9. I'll let you know when everything is done and works fine.

Thanks again for your great support.

Best regards,

Christophe

Former Member
0 Kudos

Hello Thorsten,

We now migrated our DB to 7.9.08.28 but we unfortunately now have another problem. We keep having SAP DBTech JDBC: [-9111] exceptions which are also reproducible. We already tried the fix mentioned here

but the UseStrategyCache parameter is already set to NO for our db.

We have also recreated all the tables for which we've had that problem but it didn't help.

Do you maybe have an idea about what this problem could be? I can of course give you the error files or even reproduce the bug in slow kernel mode if you want to (but you might have to give us once again the compiled slow kernel binary).

Thanks and best regards,

Christophe

simon_matter
Participant
0 Kudos

Here is some more info on the issue from KnlMsg:

Thread 0x762C Task304  2015-02-26 12:31:10 ERR MOVECODE   20011:  Bad parameter: source size [324] dest size [32768], source addr [0x7f408377c8e4]+349, dest addr [0x7f40c1e9eb10]+229, 56 bytes to copy; module VKB62 , pos 5,_FILE=VKB62 ,_LINE=5
Thread 0x762C Task304  2015-02-26 12:31:10 ERR SYSERROR   51080:  -9111 Move error
Thread 0x762C Task304  2015-02-26 12:31:10 ERR MOVECODE   20011:  Bad parameter: source size [324] dest size [32768], source addr [0x7f408377c8e4]+349, dest addr [0x7f40c1e9eb10]+229, 56 bytes to copy; module VKB62 , pos 5,_FILE=VKB62 ,_LINE=5
                           2015-02-26 12:31:10 ERR MOVECODE   20013:  Module VKB62 , pos 5,_FILE=SAPDB_RangeCode+noPIC.cpp,_LINE=86
thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

I am sorry to hear that. Please uload a diagpack archive to:

https://mdocs.sap.com/mcm/public/v1/open?shr=wEwYg_VA7dJZKVw5BjzSBij_AXYayoChNHmP26TExog

Regards,
Thorsten

PS:
can you confirm that the database was really restarted after you changed the parameter UseStrategyCache to NO?

simon_matter
Participant
0 Kudos

Hi Thorsten,

I can confirm that "UseStrategyCache" is NO which it was from the beginning because it's the default in 7.9.08.28.

BTW we already had -9111 errors some years ago but maybe they were not related.

See here

Regards,

Simon

Former Member
0 Kudos

Hi Thorsten,

Thank you for the quick answer. I now uploaded the diagpack archive. As my collegue Simon wrote, the UseStrategyCache value is indeed set to NO. Note that we also just got a "SAP DBTech JDBC: [-9205]: System error: AK Catalog information not found" error which should be visible in the diagpack.

Thanks again for your help,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hi Christophe,

this seems to be a new bug...

We will try to locate the problem. At the moment we suspect that the 'move'-errors are triggered by table access to columns which were added later and were defined with 'NULL' as default value and this value was not yet changed.
This behaviour was part of a perfprmance optimization to only actually enter the default value of an added column once that column is accessed.

To identify all table columns which could potentially be affected, please execute the following SQL query:

select t.schemaname, t.tablename, t.createdate as tables_createdate, t.createtime as tables_createtime, t.alterdate as tables_alterdate, t.altertime as tables_altertime, c.createdate as columns_createdate, c.createtime as columns_createtime, c.alterdate as columns_alterdate, c.altertime as columns_altertime, c.columnname, c.default from tables t, columns c where t.createdate

The next step would then be to use an SQL update command to really enter 'NULL' for all rows in the affected tables - of course we will try to locate the bug in the meantime...

Thorsten

Former Member
0 Kudos

Hello Thorsten,

Thanks for your anwser, however the "where" part of your query is missing. Can you please re-post the query with the missing part?

Thanks and best regards,

Christophe

PS: in the meantime we went back to 7.8

thorsten_zielke
Contributor
0 Kudos

ups, sorry, here is the full query:

select

t.schemaname,

t.tablename,

t.createdate as tables_createdate, t.createtime as tables_createtime,

t.alterdate as tables_alterdate, t.altertime as tables_altertime,

c.createdate as columns_createdate, c.createtime as columns_createtime,

c.alterdate as columns_alterdate, c.altertime as columns_altertime,

c.columnname,

c.default

from tables t, columns c

where t.createdate <= c.createdate and t.createtime < c.createtime and t.tablename = c.tablename and t.schemaname = c.schemaname and t.schemaname = 'your_DB_User' and c.default is null

Former Member
0 Kudos

Hello Thorsten,

Sorry for the very late feedback. I've been trying your workaround but I don't think that it's the only problem. When we migrated to the new version and had so many -9111 errors, each error could be easily reproduced. Our application is now using again our previous 7.8 DB, but the new DB is still online and I've been trying to reproduce the errors we had, in order to see if setting the null columns to null would solve that. The problem is that I haven't managed to reproduce the errors, although there were a few that were systematically reproducible when our production app was running with the new DB.

So to me it seems that either the load has an influence, or the DB must first reach an unstable state before the problems occur and can be reproduced. However with a test application and with just me being using it, I could not reproduce the -9111 errors.

If there is anything else I can test, please let me know.

Best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christoph,

we will build you a hotfix to resolve these bugs. However, we might need to fix these problems step by step, so I can make no promises on fixing all errors with just one hotfix...

I am sorry to hear that you are facing so many problems - it seems to be the very specific scenario you are using MaxDB in.

Regards,
Thorsten

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

reading your last reply again, I am a bit confused about which database version you are now using. are you saying that you have now switched back to using 7.8?

Did you execute the SQL statement I had given you against your 7.9 system to find out if you have columns which were added via 'alter table add...'? And for this case we should have a bugfix via the mentioned hotfix. However, if you are using 7.8, we would not need that hotfix...

Thorsten

Former Member
0 Kudos

Hello Thorsten,

Thanks for the hotfix, that would be great.

Regarding these problems, couldn't you somehow "reverse engineer" the problems and maybe find out what WE are doing wrong? As you say, it seems indeed that all these problems are very specific to our application, so maybe there is "something wrong" we are doing? Maybe do you have a guess there?

Somehow our application takes the DB to some code behavior that leads to these errors, so maybe if we can avoid reaching this "unstable code-zone", we could get rid of our problems?

Strange enough, since we reverted our DB from 7.9 to 7.8 (after all the -9111 problems with the new version), we haven't had a single "-9400 AK Cachedirectory full" error, and only had a few "-9205 AK Catalog information not found". Actually before the update from 7.8 to 7.9, we had 2 databases running on our 7.8 instance. But since we reverted back from 7.9 to 7.8, we only have our main DB running on the 7.8 instance, and the other DB is still running (fine) on the new 7.9. We actually left one DB on the new 7.9 because this is just a db with some log info, and the volume of data was too big to be "copied back" on the 7.8 instance. Actually this is running fine, and it's the same application that accesses the 2 databases. We actually also have a 3rd smaller database now running just fine with the new 7.9 version (on a separate server).

It's really weird, because the main "-9400 AK Cachedirectory full" has now disappeared (although we also had phases in the past where the problem disappeared for weeks and then re-appeared).

Best regards,

Christophe

thorsten_zielke
Contributor
0 Kudos

Hello Christophe,

ok, but the hotfix would be for the 'alter table add' problem. To find out, if you are affected, please run the SQL query and let me know the result.

Thorsten

Former Member
0 Kudos

Hi again,

Our main DB is back on 7.8 because we just had far too many -9111 errors with 7.9. However we have 2 other databases now running with 7.9 and they run just fine.

And yes I ran the "null check" query on the 7.9 instance, we had exactly 6356 such columns.

If you have a hotfix for that problem for 7.9, we could of course install it on our 7.9 instance and test everything again (we would migrate our data again and switch again from 7.8 to 7.9).

Christophe

thorsten_zielke
Contributor
0 Kudos

Ok, 6356... I have just discussed this and we would like to build once again a special kernel with additional traces for you. This would not be a bug fix, but to identify where exactly the errors are coming from - if you are willing to test this based on MaxDB 7.9.08.28, let me know.

Kind regards,
Thorsten

Former Member
0 Kudos

Yes we could use that special kernel and generate more traces for you, no problem. Just let me know exactly what you expect and how we should run the tests (for example, if we should set some parameters to some special values to make sure that the DB stops when an error is detected).

Thanks,

Christophe

thorsten_zielke
Contributor
0 Kudos

Her is the kernel with additional traces - just replace with your current kernel 7.9.08.28:

https://mdocs.sap.com/mcm/public/v1/open?shr=Bl8zhCTx8eoKDaT2X1fjPklbbz2toL1GCya2f1TWZtw&obj=E_eQykk...


No need to configure anything from your side - no vtrace needed, no further parameters, no slow kernel, just use as it is and if it encounters an error, it should write the traces to the KnlMsgArchive and KnlMsg files.

Thorsten

Former Member
0 Kudos

Hello Thorsten,

Thanks for the kernel, but we unfortunately don't use maxdb in a SAP environment and we don't have access to that SAPCAR utility to uncompress .SAR files. I tried with 7-zip (as advised on some SAP forums) but it also fails. Can you please give us another format?

Thanks,

Christophe

thorsten_zielke
Contributor
0 Kudos

ok, please retry - I have uploaded a 'kernel.bin' file. Just 'kernel' was not accepted, so I have renamed it. You only need to download this and return file name to 'kernel'.

Thorsten