on 09-08-2011 12:05 PM
Hello All,
I have a peculiar problem,
I performed a Hardware migration through installation recently ci + 2 dialog instances.
After the migration, everything was fine, both the CI and App servers came up without issues.
I performed a Kernel upgrade, and then before bringing all the instances I just brought up CI, and imported the profiles for all the active servers which did only for the CI as the dialogs were down.
After that I tried to start the dialog servers and found that they are unable to come up
Please find the errors in dev_w0 below
dev_w0
SHM_PRES_BUF (addr: 0x2bb9dc5000, size: 20000000)
I *** ERROR => shmget(11509,1073741824,992) (28: No space left on device) [shmux.c 1556]
M *** ERROR => ThShMCreate: ShmCreate SHM_ROLL_AREA_KEY failed [thxxhead.c 2577]
M *** ERROR => ThIPCInit: ThShMCreate [thxxhead.c 2074]
M ***LOG R19=> ThInit, ThIPCInit ( TSKH-IPC-000001) [thxxhead.c 1523]
M in_ThErrHandle: 1
M *** ERROR => ThInit: ThIPCInit (step 1, th_errno 17, action 3, level 1) [thxxhead.c 10468]
M
I found many things related to this error and checked the entries in sysctl.conf file which are fine.
Also, the swap space according to me looks fine as shown below:
/sbin/swapon -s
Filename Type Size Used Priority
/dev/cciss/c0d0p2 partition 20972848 898500 -1
/var/swapfile0 file 10485752 0 -2
/var/swapfile1 file 10485752 0 -3
/var/swapfile2 file 10485752 0 -4
Please help me on this..<removed_by_moderator>
Edited by: Juan Reyes on Sep 8, 2011 12:17 PM
Seems like your shared memory segment is running out of space...
I would stop the system and the clear the shared memory using cleanipc, then check using ipcs command to list the status of the shared memory if you can see entries for <sid>adm then you can remove them with command ipcrm and try starting the system again.
Regards
Juan
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hello Samant,
Is this your system connected to any portal or backend system. Just for a check try to see any process is used in the backend or connected system. if yes try to kill it and then shutdown the whole system inlucding CI & application instance.
Also do cleanipc and ipcrm, memory clening.
wait for some time and then start back the system.
Hope this will help to resolve this issue.
Thanks and regards,
ram
Hi,
Please explore on the following if you haven't done before.
1. Ensure that all Filesystem on your dialog instance server mounted properly and enough free space left.
2. Perform complete reboot of your dialog instance.
3. Try re-start your dialog instance.
4. Execute sappfparcheck on the DI to see any memory related error occurs.
Good Luck..
Best Regards,
Vasanth G
Edited by: Vasanth Govindaraj on Sep 9, 2011 10:44 AM
Hi Vasanth.
1. Ensure that all Filesystem on your dialog instance server mounted properly and enough free space left.
Checked
2. Perform complete reboot of your dialog instance.
Cannot do it now will get back to you on this
3. Try re-start your dialog instance.
4. Execute sappfparcheck on the DI to see any memory related error occurs.
sappfpar check was checked, no errors were reported.
Thanks,
Samant kumar
Hi Juan,
I extracted the kernel 201, changed the permissions, created softlinks which were present before I deleted the old Kernel.
However, we are migrating 2 systems in the same host, I do not know what happened but now the DI's have come up properly with the Kernel 201 itself. Do not know what changed. Thanks for helping though!
@All,
Tanks all for your help, the issue is resolved without any change. I think this thread will provide almost every info required to resolve similar issue!!
Regards
Samant Kumar
Hello All,
The problem is due to the number of instances on the Host. When we bring down one instance only then the other can be brought up... what should I check now?
Regards,
Samant Kumar
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi Juan,
please find the hardware details below:
Hardware details
LSB Version: :core-3.0-amd64:core-3.0-ia32:core-3.0-noarch:graphics-3.0-amd64:graphics-3.0-ia32:graphics-3.0-noarch
Distributor ID: RedHatEnterpriseAS
Description: Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
Release: 4
Codename: NahantUpdate7
memory : MemTotal: 65846512 kB
MemFree: 888828 kB
8 processor 64-Bit AMD Opteron Dual Core
Regards,
Samant Kumar
Hi,
DId you check rfpar program check? Also make sure share pool 10 size is not more than 2GB.
Regards,
Vamshi.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Hi,
As pointed out by Juan this is a shared memory problem.Can you please check the os limits.I think the os is linux
we once had a issue where in shmmax was set to a very low value.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.
Old Hardware
LSB Version: :core-3.0-amd64:core-3.0-ia32:core-3.0-noarch:graphics-3.0-amd64:graphics-3.0-ia32:graphics-3.0-noarch
Distributor ID: RedHatEnterpriseAS
Description: Red Hat Enterprise Linux AS release 4 (Nahant Update 7)
Release: 4
Codename: NahantUpdate7
memory : MemTotal: 65846512 kB
MemFree: 888828 kB
8 processor 64-Bit AMD Opteron Dual Core
The new hardware is the same as the old Hardware
SAP settings
kernel.shmmax=23136829430
kernel.msgmni=1024
kernel.sem=1250 256000 100 2048
kernel.shmall=8388608
User | Count |
---|---|
75 | |
9 | |
8 | |
6 | |
6 | |
5 | |
5 | |
5 | |
5 | |
5 |
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.