cancel
Showing results for 
Search instead for 
Did you mean: 

Core file generated and system went down

Former Member
0 Kudos

Hi Guru's,

In work directory huge core file is generated application server went down. I deleted the core file and started the application it is working fine now.

(Abap Dump)ST22

SYSTEM_CORE_DUMPED

The SAP System work directoy(e.g./usr/sap/C11/D00/work) often contain a file called core

(System logs) SM21

Signal 33 receieved by operation System

Run-time error "SYSTEM_NO_ROLL" occured

Buffer SCSA Already exists(Lengh=4096)

Can you please let me know how to prevent this in future.

Regards,

Deve

Accepted Solutions (0)

Answers (3)

Answers (3)

Former Member
0 Kudos

Hi

We can not prohibit the generation of core files of various sizes.Core files can be generated with various sizes.

We have to schedule a job/script in crontab for deletion of core files,when core file is created

Regards,

Sukarna

Former Member
0 Kudos

Hi Deve

you can schedule a job/script in crontab which will prohibit the generation of core files of more than 50MB or 02GB size (for eg.) at OS level.

Bhudev

markus_doehr2
Active Contributor
0 Kudos

Core files are generated becaus of program errors.

- Is this the first time that error occured?

- Which OS are you using?

- Which kernel patchlevel and database are you using?

- Is the problem reproducible?

Markus

Former Member
0 Kudos

Hi Markus,

Is this the first time that error occured:Ans:Second time we are facing the problem

- Which OS are you using:Ans:AIX

- Which kernel patchlevel and database are you using:640/247 and Oracle 9.2

- Is the problem reproducible:No

The core file size is more then 128gb .

Regards,

Deve

markus_doehr2
Active Contributor
0 Kudos

> The core file size is more then 128gb .

Usually, if a coredump is created, there also exists a corresponding dump (SYSTEM_CORE_DUMP) in ST22. If you have such a dump, check the "Active calls in SAP kernel" section and post it, there we may see, where the dump exactly occurs (in the SAP kernel or in the database client).

Most likely someone from SAP must analyze that further.

A coredump is the full content including memory of a process at the time it failed (http://en.wikipedia.org/wiki/Core_dump)

Markus

Former Member
0 Kudos

Hi Deve

May be you can utilize the idea as the following script what we have on our servers

kernel 700/166

OS AIX

oracle 10.2

r4lpar18:pr1adm 25> more remove_core.sh

#!/bin/ksh

  1. This script is to be scheduled by cron under <SID>adm to run every x minutes.

  2. and will find core files in the /usr/sap/<SID> directory and remove them

#----


. $HOME/.profile

  1. If the user is <SID>adm then define PATH to be /usr/sap/<SID> directory

  2. If the user is ora<SID> then define PATH to be $SAPDATA_HOME directory

if [ `/usr/bin/whoami | cut -c 4-6` = 'adm' ]

then

FINDPATH=/usr/sap/$SAPSYSTEMNAME/

elif [ `/usr/bin/whoami | cut -c 1-3` = 'ora' ]

then

FINDPATH=$SAPDATA_HOME # If the user is ora<SID> then look in the /oracle/<SID> directory

fi

  1. Search the path defined above

echo "`date` Path searching.... $FINDPATH"

for i in `find $FINDPATH -name "core" -type f`

do

echo "Execution date: " `date` "Core file: " $i

file $i

ls -l $i

if [ $? = 0 ]

then

rm $i

if [ $? = 0 ]

then

echo "File removed"

else

echo "Error deleting file!"

fi

else

echo "File not found"

fi

done

exit 0

Bhudev

Former Member
0 Kudos

Hi Markus,

Active calls in SAP kernel

=> 64 bit R/3 Kernel

=> 64 bit AIX Kernel

=> Heap limit = unlimited

=> Stack limit = unlimited

=> Core limit = 2147483648

=> File size limit = unlimited

=> Heap address = 0x0x1177ca100

=> Stack address = 0xfffffffffffa150

=> Stack low = 0xfffffffffff4540

=> Stack high = 0xffffffffffff950

=> Stack Trace:

bcopy() at 0x100005bd8

ImportComplexData1__FP8CONNE_RDPC12ImportOpHeadPCvPvUlUcPP5Stack() at 0x100ba7a30

ImportComplexData1__FP8CONNE_RDPC12ImportOpHeadPCvPvUlUcPP5Stack() at 0x100ba8fa0

ImportComplexData__FP8CONNE_RDPC12ImportOpHeadPCvPvUl() at 0x100ba0978

ab_connread__FUiP11IMPORT_INFOP23EXPIMP_DATA_OBJECTS_ADMP7AB_DATAPFPUsUlP23EXPIMP_DAT >

> A_OBJECTS_ADMPP7AB_DATAPPv_iPFPPUcPUlUi9EXPO_MODE_i() at 0x100b9fb44

expo_import__Fv() at 0x100bc37a8

ab_jimpo__Fv() at 0x100bc2fd4

ab_extri__Fv() at 0x10058c954

ab_xevent__FPCUs() at 0x1007a12dc

ab_dstep() at 0x100796b5c

dynpmcal() at 0x100c4c044

dynppbo0() at 0x100c497a4

dynprctl() at 0x100c517d0

dynpen00() at 0x100217680

Thdynpen00() at 0x1000a79a8

TskhLoop() at 0x1000ac56c

tskhstart() at 0x1000c2200

DpMain() at 0x1017c4f18

nlsui_main() at 0x101795b6c

=> CPU Registers:

msr = 0xa00000000000d0b2 iar = 0x0000000100005bd8

ctr = 0x0000000000000014 lr = 0x0000000100bb27a0

xer = 0x0000000020000012 cr = 0x0000000028282288

r00 = 0x0000000000000010 r01 = 0x0fffffffffffa150

r02 = 0x00000001110ef1b0 r03 = 0x0700000030be9e98

r04 = 0x0fffffffffffb3a4 r05 = 0x0000000000000041

r06 = 0x0700000030be9ff8 r07 = 0x0020002000200020

r08 = 0x0000000000000000 r09 = 0x0000000000000000

r10 = 0x0fffffffffffa0f0 r11 = 0x000000000000012c

r12 = 0x0000000100ba78c0 r13 = 0x0000000113964da0

r14 = 0x00000001109b7e80 r15 = 0x0000000110070030

r16 = 0x07000008f5aa4ce8 r17 = 0x07000008f5b078d8

r18 = 0x0000000000000000 r19 = 0x0700000030999c78

r20 = 0x0000000111112f60 r21 = 0x000000011111cdb8

r22 = 0x000000011016fc5c r23 = 0x000000011016fc58

r24 = 0x0000000000000a7c r25 = 0x0000000110804c38

r26 = 0x00000001018fba80 r27 = 0x0000000000000000

r28 = 0x0fffffffffffa860 r29 = 0x0fffffffffffac08

r30 = 0x0000000000000208 r31 = 0x0700000030be9e98

Regards,

Deve

markus_doehr2
Active Contributor
0 Kudos

Deve,

if you post such "fragments", please put next time code tags (the << >> sign in next to the underline) around the postings, this will make them much more readable and prevent the backend system from trying to "interpret" - thanx.

> bcopy() at 0x100005bd8

> ImportComplexData1__FP8CONNE_RDPC12ImportOpHeadPCvPvUlUcPP5Stack() at 0x100ba7a30

> ImportComplexData1__FP8CONNE_RDPC12ImportOpHeadPCvPvUlUcPP5Stack() at 0x100ba8fa0

> ImportComplexData__FP8CONNE_RDPC12ImportOpHeadPCvPvUl() at 0x100ba0978

> ab_connread__FUiP11IMPORT_INFOP23EXPIMP_DATA_OBJECTS_ADMP7AB_DATAPFPUsUlP23EXPIMP_DAT

So bcopy() was failing - this is an OS function (see man bcopy).

I would

- update the kernel to a recent version (patchlevel 276)

- check with note "Note 1119631 - Using SAP systems with AIX 5.3" if your OS is on a supported level (assuming you're runing 5.3, you didn't specify that)

Also as information: Oracle 9.2 is out of support.

Markus

Former Member
0 Kudos

Hi Markus,

Markus we are using Oracle 10.2.If we update the kernel to a recent version (patchlevel 276) this issue will be resolved. Please let me know.

Regards,

Deve

markus_doehr2
Active Contributor
0 Kudos

Please read my previous post.

I (and most probably nobody here) can tell if the error will go away by just installing a new kernel patch. The error happens in an operating system function so only SAP and/or IBM can tell.

Markus

Former Member
0 Kudos

Hi Markus,

I found below in workprocess trace in work directory

M stat = 4

M reqtype = 1

M act_reqtype = 1

M rq_info = 384

M tid = 323

M mode = 2

M len = 832

M rq_id = 32836

M rq_source = 1

M last_tid = 323

M last_mode = 2

M int_checked_resource(RFC) = 0

M ext_checked_resource(RFC) = 0

M int_checked_resource(HTTP) = 0

M ext_checked_resource(HTTP) = 0

M report = >ZTRGL_GL_COA_REPORT <

M action = 0

M tab_name = > <

M

M Modeinfo for User T323/M2

M

M tm state = 2

M uid = 27937

M term type = 0x4

M display = 0x8

M cpic_no = 1

M cpic_idx = 157

M usr = >PRIVERS <

M terminal = >wabelhdk0145867 <

M client = >810<

M conversation_ID = > <

M appc_tm_conv_idx = -1

M its_plugin = NO

M allowCreateMode = YES

M blockSoftCanel = NO

M imode = 1

M mode state = 0x42

M mode clean_state = 2

M task_type = ZTTADIA

M th_errno = 42

M rollout_reason = 1

M last_rollout_level = 7

M async_receives = 0

M cpic_receive = 0

M em handle = 194

M roll state = 4

M abap state = 4

M em state = 3

M eg state = 1

M spa state = 3

M enq state = 0

M softcancel = 1

M cancelInitiator = DISPATCHER

M next hook = T-1/U-1/M255

M master hook = T-1/U-1/M255

M slave hook = T-1/U-1/M255

M debug_tid = 255

M debug_mode = 0

M mode type = 0x1

M debug = 0

M tcode = >ZTFIGL10 <

M client conversation_ID = > <

M server conversation_ID = > <

M lock = 0

M max enq infos = 9

M act enq infos = 0

M em_hyper_hdl = NULL

M plugin_info = NULL

M act_plugin_hdl = -1

M act_plugin_no = 0

M max_plugin_no = 0

M

M Adresse Offset Data from APPC-CA-AREA

M -


M 0x70000002216be70 000000 060b0202 6d210000 009f0000 00000000 |....m!..........|

M 0x70000002216be80 000016 00ffffff ff000000 00000000 00001000 |................|

M 0x70000002216be90 000032 00000000 00000000 38373931 32343336 |........87912436|

M 0x70000002216bea0 000048 00000000 00000000 00000000 00000000 |................|

M 0x70000002216beb0 000064 00000000 00000000 00000000 009d007b |...............{|

M 0x70000002216bec0 000080 2a455252 2a003100 54685369 6748616e |ERR.1.ThSigHan|

M 0x70000002216bed0 000096 646c6572 3a207369 676e616c 00313100 |dler: signal.11.|

M 0x70000002216bee0 000112 5461736b 68616e64 6c657200 36343000 |Taskhandler.640.|

M 0x70000002216bef0 000128 31007468 78786865 61642e63 00393839 |1.thxxhead.c.989|

M 0x70000002216bf00 000144 33000057 65642041 70722020 31203136 |3..Wed Apr 1 16|

M 0x70000002216bf10 000160 3a31323a 34302032 30303900 00000037 |:12:40 2009....7|

M 0x70000002216bf20 000176 00534150 2d536572 76657220 70726f76 |.SAP-Server prov|

M 0x70000002216bf30 000192 70723035 5f565052 5f303020 6f6e2068 |pr05_VPR_00 on h|

M 0x70000002216bf40 000208 6f737420 70726f76 70723035 20287770 |ost provpr05 (wp|

M 0x70000002216bf50 000224 20313429 00000000 002a4552 522a0000 | 14).....ERR..|

M 0x70000002216bf60 000240 20002000 20002000 20002000 20002000 | . . . . . . . .|

M 0x70000002216bf70 000256 20002000 20002000 20002000 20002000 | . . . . . . . .|

M 0x70000002216bf80 000272 20002000 08000601 00007300 61007000 | . .......s.a.p.|

M 0x70000002216bf90 000288 63006f00 72006500 5f005600 50005200 |c.o.r.e._.V.P.R.|

M 0x70000002216bfa0 000304 5f003000 30000000 00000000 00000000 |_.0.0...........|

M 0x70000002216bfb0 000320 00000000 00000000 00000000 00000000 |................|

M 0x70000002216bfc0 000336 00000000 00000000 00000000 00000000 |................|

M 0x70000002216bfd0 000352 00000000 00000000 00000000 00000000 |................|

M 0x70000002216bfe0 000368 00000000 00000000 00000000 00000000 |................|

M 0x70000002216bff0 000384 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c000 000400 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c010 000416 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c020 000432 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c030 000448 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c040 000464 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c050 000480 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c060 000496 00000000 00000000 00000000 00000000 |................|

M -


M PfStatDisconnect: disconnect statistics

M Entering ThSetStatError

M ThIErrHandle: don't try rollback again

M Entering ThReadDetachMode

M *** ERROR => ThIErrHandle: bad value for th_act_em_hdl (194), detach T323/M2 [thxxhead.c 10141]

M

M Modeinfo for User T323/M2

M

M tm state = 2

M uid = 27937

M term type = 0x4

M display = 0x8

M cpic_no = 1

M cpic_idx = 157

M usr = >PRIVERS <

M terminal = >wabelhdk0145867 <

M client = >810<

M conversation_ID = > <

M appc_tm_conv_idx = -1

M its_plugin = NO

M allowCreateMode = YES

M blockSoftCanel = NO

M imode = 1

M mode state = 0x42

M mode clean_state = 2

M task_type = ZTTADIA

M th_errno = 42

M rollout_reason = 1

M last_rollout_level = 7

M async_receives = 0

M cpic_receive = 0

M em handle = 194

M roll state = 4

M abap state = 4

M em state = 2

M eg state = 1

M spa state = 3

M enq state = 0

M softcancel = 1

M cancelInitiator = DISPATCHER

M next hook = T-1/U-1/M255

M master hook = T-1/U-1/M255

M slave hook = T-1/U-1/M255

M debug_tid = 255

M debug_mode = 0

M mode type = 0x1

M debug = 0

M tcode = >ZTFIGL10 <

M client conversation_ID = > <

M server conversation_ID = > <

M lock = 0

M max enq infos = 9

M act enq infos = 0

M em_hyper_hdl = NULL

M plugin_info = NULL

M act_plugin_hdl = -1

M act_plugin_no = 0

M max_plugin_no = 0

M

M Adresse Offset Data from APPC-CA-AREA

M -


M 0x70000002216be70 000000 060b0202 6d210000 009f0000 00000000 |....m!..........|

M 0x70000002216be80 000016 00ffffff ff000000 00000000 00001000 |................|

M 0x70000002216be90 000032 00000000 00000000 38373931 32343336 |........87912436|

M 0x70000002216bea0 000048 00000000 00000000 00000000 00000000 |................|

M 0x70000002216beb0 000064 00000000 00000000 00000000 009d007b |...............{|

M 0x70000002216bec0 000080 2a455252 2a003100 54685369 6748616e |ERR.1.ThSigHan|

M 0x70000002216bed0 000096 646c6572 3a207369 676e616c 00313100 |dler: signal.11.|

M 0x70000002216bee0 000112 5461736b 68616e64 6c657200 36343000 |Taskhandler.640.|

M 0x70000002216bef0 000128 31007468 78786865 61642e63 00393839 |1.thxxhead.c.989|

M 0x70000002216bf00 000144 33000057 65642041 70722020 31203136 |3..Wed Apr 1 16|

M 0x70000002216bf10 000160 3a31323a 34302032 30303900 00000037 |:12:40 2009....7|

M 0x70000002216bf20 000176 00534150 2d536572 76657220 70726f76 |.SAP-Server prov|

M 0x70000002216bf30 000192 70723035 5f565052 5f303020 6f6e2068 |pr05_VPR_00 on h|

M 0x70000002216bf40 000208 6f737420 70726f76 70723035 20287770 |ost provpr05 (wp|

M 0x70000002216bf50 000224 20313429 00000000 002a4552 522a0000 | 14).....ERR..|

M 0x70000002216bf60 000240 20002000 20002000 20002000 20002000 | . . . . . . . .|

M 0x70000002216bf70 000256 20002000 20002000 20002000 20002000 | . . . . . . . .|

M 0x70000002216bf80 000272 20002000 08000601 00007300 61007000 | . .......s.a.p.|

M 0x70000002216bf90 000288 63006f00 72006500 5f005600 50005200 |c.o.r.e._.V.P.R.|

M 0x70000002216bfa0 000304 5f003000 30000000 00000000 00000000 |_.0.0...........|

M 0x70000002216bfb0 000320 00000000 00000000 00000000 00000000 |................|

M 0x70000002216bfc0 000336 00000000 00000000 00000000 00000000 |................|

M 0x70000002216bfd0 000352 00000000 00000000 00000000 00000000 |................|

M 0x70000002216bfe0 000368 00000000 00000000 00000000 00000000 |................|

M 0x70000002216bff0 000384 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c000 000400 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c010 000416 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c020 000432 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c030 000448 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c040 000464 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c050 000480 00000000 00000000 00000000 00000000 |................|

M 0x70000002216c060 000496 00000000 00000000 00000000 00000000 |................|

M -


M call ThrShutDown ...

M ThDBDisconnect: disconnect from data base

B Disconnecting from ALL connections:

B Wp Hdl ConName ConId ConState TX PRM RCT TIM MAX OPT Date Time DBHost

B 014 000 R/3 000000000 ACTIVE YES YES NO 000 255 255 20090329 014522 <Central server>

C Disconnecting from connection 0 ...

C Close user session (con_hdl=0,svchp=0x1146d3388,usrhp=0x1146dd998)

C Detaching from DB Server (con_hdl=0,svchp=0x1146d3388,srvhp=0x1146d3608)

C Now I'm disconnected from ORACLE

B Disconnected from connection 0

B statistics db_con_commit (com_total=1789, com_tx=261)

B statistics db_con_rollback (roll_total=50, roll_tx=0)

M ThDBDisconnect: disconnect o.k.

M ***LOG Q02=> wp_halt, WPStop (Workproc14 1183958) [dpuxtool.c 318]

M Good Bye .....

Can you plese suggest

Regards,

Deve

markus_doehr2
Active Contributor
0 Kudos

Deve,

look at what you posted. I requested you to insert code tags around such text postings in my previous answer - nobody can read what you posted

And again: I can just repeat myself: The OS signaled the process to get killed due to a problem in the bcopy() routine - which is an AIX system function.

Markus

Former Member
0 Kudos

Hi Markus,

I find signal 33 received

reflecting various processes dying with signal 33.

Cause

Not enough paging space configured.

Resolving the problem

Paging space needed to be increased. The following are things to watch for when increasing paging space.

1. If processes die due to having received signal 33 (SIGDANGER). To verify this, use the errpt command to view the system error log entries. When the system sends a signal 33, it indicates that the system has only about 2 MB of free page space. Shortly after the system sends a signal 33, the system starts killing the most current processes.

2. If any of the following messages are displayed:

-INIT: Paging space is low

-ksh: cannot fork no swap space

-Not enough memory

-Fork function failed

-fork ()system call failed

-Unable to fork, too many processes

-Fork failure -not enough memory available

-Fork function not allowed. Not enough memory available.

-Cannot fork: Not enough space

-signal 33 received

-SIGDANGER received

Can you suggest me.

Regards,

Deve

markus_doehr2
Active Contributor
0 Kudos

> Hi Markus,

>

> I find signal 33 received

> reflecting various processes dying with signal 33.

>

> Cause

> Not enough paging space configured.

>

> Resolving the problem

> Paging space needed to be increased. The following are things to watch for when increasing paging space.

> Can you suggest me.

I suggested several things which you seem to just ignore but posting the same information again and again:

- create an OSS call to let the support have a look on the system

- update your SAP kernel to the latest version

- check your OS level patches according to the given note

- create a call with IBM to let them have a look at the system

The system coredumps because there is not enough paging for the current operation. You may fix that symptom by increasing the paging but that will not give any solution for the problem why there is so much paging needed.

Nobody but SAP and/or IBM can find that out.

Markus

Former Member
0 Kudos

Hi Deve

Why not give a try utilizing an idea given in the script for deletion of core files, which we are already using in our servers, as posted in my earlier reply....and as suggested by Sukarna too.

Bhudev