cancel
Showing results for 
Search instead for 
Did you mean: 

memwatch - the system seems to be overloaded

jgleichmann
Active Contributor
0 Kudos

Hello BIA experts,

I´ve detected some errors in my BIA traces:

1098918208] 2008-12-09 00:47:36.146 e memwatch MemWatch.cpp(00538) : checkMem() takes long: 3s, the system seems to be overloaded. t1:1228780053,84 t2(freeSharedMem):1228780053,84 t3(mergeThrStarted):1228780053,84 t4(allUnloaded):1228780053,84 t5(diskspaceChecked):1228780056,146

I thought this error only occurs if the memory were used by over 50% or is it only a timeout of memwatch? Our snmp data tell me that the memory over all blades never come over 30%. Is there a problem on application site?

I run BIA on Rev48.

Thanks in advance!

Best Regards,

Jens

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi.

-


1098918208] 2008-12-09 00:47:36.146 e memwatch MemWatch.cpp(00538) : checkMem() takes long: 3s, the system seems to be overloaded. t1:1228780053,84 t2(freeSharedMem):1228780053,84 t3(mergeThrStarted):1228780053,84

t4 (allUnloaded): 1228780053,84

t5 (diskspaceChecked) :1228780056,146

---

The error mesage tell us that the difference between the timestamps t4 and t5 is about 2,5 seconds: required to get the free disk space info from the filer.

The memwatch does consider this time interval as too long and not acceptable, that's why this message will be traced out.

regards,

Gennady

jgleichmann
Active Contributor
0 Kudos

Hi Gennady,

thanks for your anwer. Can you tell me how I can avoid this error messages? All the errors are occored at the same time. At this time the OS backup is running, so we have many I/O at the disks.

Best Regards,

Jens

Former Member
0 Kudos

Hi, Jens

let me describe the problem as I do understand it.

MemWatch is activated with its default setting:

watch_interval= 500 #millisec

That means, it is desired that MemWatch does check memory every 500ms and react approriately if memory usage is not ok. It is not recommended to enlarge the default watch_inerval.

In your case MemWatch needs more than 2000ms just to get the information about the free disk space, MemWatch is waiting for the response and couldn't react even if at the same time the disk space became short. I suppose, you don't prevent writing on the disk during the OS backup, so it would be possible that free disk capacity shrinks dangerously.

( .. Can you tell me how I can avoid this error messages? .. )

- tune the filer to make it faster would be the best soultion, unfortunately I'm not sure that it is technically possible

- set trace level memwatch=none, than you will not see any memwatch errors; not recommended, but you can do it if the error messages are too annoying

- shutdown TREX during the OS backup, if it is an option for your business

- maybe possible in the newest or the next TREX release to change the memwatch setting which is responsible for the disk check in TrexIndexServer.ini, so that this error message will be not(so often) written

regards,

Gennady

jgleichmann
Active Contributor
0 Kudos

Hello,

the problem seems to be fixed with Rev49. May be the reason is the new parameter unlimited_wait_allowed

Thanks to everybody for your helpful replies!

Best Regards,

Jens

Former Member
0 Kudos

Hi Gennady,

I see you resolved one of the trex issue for the user. Could you please provide details about the below memwatch error :

We are having this issue on our production system and checked everything from windows, storage area and could not find the reason for this error :

[2852] 2015-01-04 06:33:07.258 e memwatch     MemWatch.cpp(00815) : checkMem() takes long: 13s, the Trex system may be overloaded. Threshhold: 10s

                       t2(freeSharedMem):0.0s 1420378374 1420378374s 367 367ms

                       t3(mergeThrStarted):0.0s

                       t4(allUnloaded):0.0s

                       t5(diskspaceChecked):13.-109s

                       t6(checkSystemIsSwapping):0.0s

[9932] 2015-01-04 06:44:38.133 e attributes   AttributeFactory.cpp(00423) : strange flag combination found while creating an attribute; defaulting to deprecated multi value implementation; this is probably not what you want; flags index | new_multi; index ses:brp900_bus1001006_1en; attribute min_order

[9932] 2015-01-04 06:55:45.180 e TREX_C_STORE CSFile.cpp(00463) : cstore fsync failed: The specified network name is no longer available(rc=64)

[9932] 2015-01-04 06:56:22.134 e TREX_C_STORE CSFile.cpp(00761) : close('T:\index\ses\brp900_bus1001006_1\en\contentStore.tmp',wb) failed with error 113 (file sync failed), 4 global open files

[9932] 2015-01-04 06:56:22.150 e TREX_C_STORE FileHandle.cpp(02423) : ERROR: FHW::copyAndOptimize: close/sync operation failed T:\index\ses\brp900_bus1001006_1\en\contentStore.tmp

Thanks

Amair

former_member217429
Active Contributor
0 Kudos

Hi Amair,

in your case we have here some issues with the Network connection between the TREX
instance and mapped network drive T: Could you please check if you see some related error
messages in the application or system log of Windows (event viewer) .
Best regards,
Mikhail

Former Member
0 Kudos

Hi Mikhail,

Thanks for the response. We looked all event viewer logs and could not find any related network errors. Still, this issue persist. Is there anything else you can suggest us to look about the memwatch error?

Thanks

Amair

Former Member
0 Kudos

Hi Mikhail, We have already opened message (1001612) with sap. We are woking on different options but sap is also not sure why there are no logs in event viewer about the network errors.

Thanks

Amair

Former Member
0 Kudos

Hi Mikhail,

We have installed Trex on new 2012 servers and everything looks to be working fine. Only thing is we see below messages in Index trace file.Any idea, what these errors for?



[9932] 2015-01-04 06:44:38.133 e attributes   AttributeFactory.cpp(00423) : strange flag combination found while creating an attribute; defaulting to deprecated multi value implementation; this is probably not what you want; flags index | new_multi; index ses:brp900_bus1001006_1en; attribute min_order

Thanks

former_member217429
Active Contributor
0 Kudos

Hi  Amair,

this seems to be a completely different issue on the application side. I would like to suggest to create an OSS message about this.

Best regards,
Mikhail

Answers (4)

Answers (4)

jgleichmann
Active Contributor
0 Kudos

Hello,

the problem seems to be fixed with Rev49. May be the reason is the new parameter unlimited_wait_allowed

Thanks to everybody for your helpful replies!

Best Regards,

Jens

Edited by: Jens Gleichmann on Feb 2, 2009 11:54 AM

Former Member
0 Kudos

Hi Jens

Your will get this alert when the BIA Index server service usage reached the 100% of the Physical mem size. This is memory usage is virtual memory usage.

How much swap space you have configured ?

Regards

Karthi

Vitaliy-R
Developer Advocate
Developer Advocate
0 Kudos

Hi Jens,

It seems more like a temp problem with getting statistics from memwatch, not the issue with overloaded memory

Regs,

-Vitaliy

jgleichmann
Active Contributor
0 Kudos

Hi Vitaliy,

I think it´s not a temp problem because I get this error since October 16. So it´s not a one time problem.

@Karthi:

according to our monitoring none of the blade ever used page space or came over 50%. I think it´s a problem with the memwatch itself, but not temporary problem.

Thanks for your answers.

Best Regards,

Jens

Former Member
0 Kudos

Hello,

When you talk about 25% memory utilization - please be sure you check it comparing to size of the physical memory, not overall memory.

also try:

Via RSDDBIAMON2+shell:

Check the details of the alert check and see what blade had unloads then shell into that blade and run "top" (linux command) and see what process is consuming the most memory

Via TREXAdmin:

1. Hosts -> Memory -> See which blades are consuming high memory

2. Usage -> Access Stats -> Sort downward and find the entry <sum>

If no, try to open a customer message under the BC-BIA-TRX component.

Reg,

Dhanya

jgleichmann
Active Contributor
0 Kudos

Hello Dhanya,

thanks for your answer! Sry for my wrong expression I meant none of the blade comes over 30% load.

In summary we have 4 blades with 16GB memory.

at Hosts -> Memory I can see in average 3,5GB

at Usage -> Access Stats I can see 6GB in <sum>

Not so much as you can see. May be now you understand why I wonder.

Best Regards,

Jens