So far there has been many incidents with such complaint.
However system administrators seem to have different definition of 'hang'.
Although such issue can usually be resolved by a restart, Root Cause Analysis is usually pursued.
This blog tries to sort things out for system admins.
At least, the system admin had better know which logs should be collected before the restart, so that we can grab a chance for RCA.
/* 'Server hang' is definitely a gigantic topic - this blog will try not to dig into further technical details. */
1. Clearly define the symptom.
- Is it occurring only upon some specific operation? Or on whole system?
- Is it occurring only for specific J2EE / Portal user?
- Is it occurring only on specific client PC / browser?
- Is it occurring only on newly-logged-on users? Is it also occurring on already-logged-on users?
- Is it occurring with or without load balancer?
- Is it occurring on all instances / server nodes?
- Is AS Java 'green' in SAP MMC / SAP MC?
Besides all above, screenshots / HTTP Watch trace are definitely helpful.
These questions help you as well as SAP support to understand your problem.
2. How to proceed the RCA
Firstly some basic rules:
- If Load Balancer blocks the way -> check with LB vendor.
- If dispatcher / ICM / server node has died -> don't expect a normal behavior. Check work folder and defaultTrace.
- If issue only occurs on specific client PC / browser -> check if browser is supported as per PAM. And check if this PC has any peculiarities against others.
- If issue only occurs on certain instance / server nodes -> check below steps agains that specific instance / server node.
- If issue occurs on consumer portal under FPN scenario, also check the provider system.
- Last but not least, make sure there's enough CPU/RAM/Disk resource on OS.
Regarding other scenarios, for simplicity, you can collect below trace together.
- HTTP Watch trace
- Thread dump or SAP JVM Profiler trace, on server node (and also dispatcher for 7.0X)
- work folder
- SAP MMC Snapshot
// If you have to know why these traces are necessary:
- Scenario 1
AS Java is running, responding, but some specific application returned a blank page (browser is no longer loading the page). Other applications are working fine.
In this case, server is not actually in 'hang' status.
-> Collect HTTP Watch trace so that we can see where it stopped.
-> Also check PAM to see if the IE version is supported.
- Scenario 2
AS Java is running, responding, but some specific application did not respond and browser is still waiting. Other applications are working fine.
We must check where it actually hangs during HTTP traffic - it might be on AS Java, on AS ABAP, or on 3rd party system, or simply on network.
-> In this scenario, HTTP Watch trace will be necessary at very first place.
-> In many cases it is indeed hanging on AS Java - see below.
- Scenario 3
AS Java is running but not responding. Or, it is refusing new requests but still serving the old ones.
It is very likely that (some specific kind of) threads are exhausted, and we must check at runtime.
-> Collect thread dump or SAP JVM Profiler trace when issue IS OCCURRING. This is necessary to tell the root cause.
-> Collect SAP MMC Snapshot for 7.10 onwards.
-> Collect work folder logs
-> Collect defaultTrace
N.B., it's not a guarantee that the logs listed above are 100% enough for every issue. But it's a good start.
At least it's better than "Hey, system hang occurred. What's the root cause? We MUST prevent it."