cancel
Showing results for 
Search instead for 
Did you mean: 

DB 10.5 fp3 Upgrade Hung Due to Lock on Log File

former_member196032
Participant
0 Kudos

My attempted upgrade of a SAP prod db from luw 9.7 fp9 to luw 10.5 fp3 failed miserably this past weekend.   The upgrade check was successful, but the actual db upgrade hung due to a lock on an archive log.

I'm wondering if anyone else has run into this phenomenon -  I have opened a PMR with IBM on Sev 1, but they weren't able to help me troubleshoot so I backed out.   From the start nothing went right on this server.  I have done mulitple large non prod upgrades without any issues.

Here are the steps to the failed upgrade:

1. I used a response file to install the actual 10.5 software.  That failed

2.  I used "db2_install" instead (I bypassed the java GUI because it is so slow) - everything installed fine

3.  db2ckupgrade command was successful

3.  The instance upgrade kept failing with RC 1 - very cryptic

4. I decided to take all my little tricks out of my DBA hat and dropped the 9.7 instance and created a 10.5 instance, updated all the dbm config parms (I have an upgraded stage copy), and cataloged the 9.7 database successfully

5. applied the DB2 10.5 license

6. proceeded to issue the "db2 upgrade  database SID"

     - this command never finished

     after about 15 minutes I saw the following messages in the db2diag.0.log

         Log stream 0 has been marked consistent.

     followed by  

         Database has been marked consistent.

MESSAGE : DB2 is waiting for log files to be archived. DB2 was unable to confirm logs were archived. Return code -2029059911, FirstArchNum 613931, FirstArchNum2 4294967295, HeadExtentID 613934 MESSAGE : ECF=0x900001C0=-1879047744=ECF_GENREG_OPEN_OUTPUT_FILE_FAILED           Failed to open the output registry CALLED  : OS, -, fopen RETCODE : ECF=0x90000513=-1879046893=ECF_GENREG_REGISTRY_DOESNT_EXIST          The registry does not exist. DATA #1 : String, 42 bytes /db2/db2pse/sqllib/cfg/db2instanceinfo.reg CALLSTCK: (Static functions may not be resolved correctly, as they are resolved to the nearest 2147153952=SQLE_RC_RU_INVALID_AL_CFG_FILE_SIZE           "Invalid Architecture Level Configuration File size" DATA #1 : String, 45 bytes db2instanceinfo.reg cfg file does not exist ! DATA #2 : String, 18 bytes /db2/db2pse/sqllib DATA #3 : File size, 8 bytes 0 DATA #4 : File size, 8 bytes 0 DATA #5 : unsigned integer, 8 bytes 0 "Invalid Architecture Level Configuration File size"

but the actual upgrade command never returned to the CL

7.  Rebooted the server and attempted to upgrade again - same issue

8.  IBM suggested to back out - so I did, restore back to 9.7.    I spent 16 hours on this - urgh

BTW the issue was logged under IBM DB2 LUW PMR 82768,379,000 - Cannot connect after upgrade

At this point I'm looking for any suggestions.  Thank you

Accepted Solutions (0)

Answers (2)

Answers (2)

MarcinOzdzinski
Participant
0 Kudos

3.  The instance upgrade kept failing with RC 1 - very cryptic

Is your prod env using virtual host names for db instances ?

If you haven't change it to real ones in db2nodes.cfg before starting upgrade thats the reason .....

Do you have instance upgrade logs to show ?

Regards

Marcin

former_member196032
Participant
0 Kudos

Hi Marcin,  I do use aliases, but in this case, I had already changed the alias to a "real" hostname.

thanks for the tip

JPReyes
Active Contributor
0 Kudos

Hi Anke,

First thing I would check is that whatever media/storage LOGARCHMETH1 is pointing to is available and working correctly.

Regards, JP

former_member196032
Participant
0 Kudos

Hi Juan,  thank you for your tip.  That is exactly what i did as the first check to see if logarchmeth1 was correct, and it was.   I would have expected normal db2 behavior - ie if log archiving fails to archive to the archmeth, then to have DB2 send it directly to arch_fail, but it did not.

All activity stopped - everything hung completely.

IBM support found a message in the post upgrade db2diag.0.log.  db2detaildeadlock monitor filled up a directory on a very small volume group.  I missed that error completely.  But I ask myself, could a deadlock monitor really cause this issue of all hanging processes?

 

2014-12-13-14.13.45.101841-360 I38940A1431 LEVEL: Error

PID : 8978440 TID : 17221 PROC : db2sysc 0

INSTANCE: db2pse NODE : 000 DB : PSE

APPHDL : 0-31 APPID: *LOCAL.DB2.141213201406

AUTHID : DB2PSE HOSTNAME: useagan17196p

EDUID : 17221 EDUNAME: db2evmgi (DB2DETAILDEADLOCK) 0

FUNCTION: DB2 UDB, database monitor, sqm_evmon_ftarget::sqm_evmon_ftarget, probe:311

MESSAGE : ZRC=0x800D002B=-2146631637=SQLM_RC_EVFULL "monitor full of data"

DIA8052C The Event Monitor "" has reached its file capacity. Delete

the files in the target directory "" or move them to another

directory.

 

MESSAGE : ADM2017C The Event Monitor "DB2DETAILDEADLOCK" has reached its file

capacity. Delete the files in the target directory

"/db2/PSE/db2pse/NODE0000/SQL00001/MEMBER0000/db2event/db2detaildeadl

ock" or move them to another directory.

JPReyes
Active Contributor
0 Kudos

Ahh... Only one way to find out!!!.. 😄