cancel
Showing results for 
Search instead for 
Did you mean: 

Jobs run slower on application server compared to the central instance

former_member117942
Participant
0 Kudos

Hi,
we have a SAP ERP 606 system based on Windows 2008 R2 - DB2 9.7 with a CI and two applications server.
The CI has 32 GB of RAM (10 GB reserved to DB) and 8 virtual CPU; application server has the same configuration (32 GB RAM and 8 virtual CPU).

SAP parameters are the same on both CI and APPL; CI and APPL run on separate phisical host dedicated (no other VM are running on the same host), so we have no interference due to other virtual machine running.

The same jobs running on application server have runtime compared to CI of 4-5 times.
Network problems have been excluded; CPU utilization of APPL are very low (10-15%).
On the CI CPU utilization is high (between 40-70%) but dialogue time response is good; also on application server dialogue time response is good.

We have also made another test with CI and APPL on the same virtual machine (job on APPL are slower than CI).

Here an examples of the same report running on APPL e CI.

Any idea  ?
Thanks
Maurizio

Examples


SERVER      Report                                 Total                 CPU                      DB TIME

APPL            ZWG_EXPLISTINI_NEW     852.654,0         100.589,0            759.970,0

CI                 ZWG_EXPLISTINI_NEW      136.879,0         71.526,0              67.384,0

Accepted Solutions (0)

Answers (3)

Answers (3)

Former Member
0 Kudos

Hello Maurizio,

I'm investigating too why DB+CI is faster that AS.

We are on MS win 2008R2 and SQL Server 2008 R2, on batch job we have note 4-5 times lower, but on AS the performance is not as good as on DB+CI.

Keep us informed.

Ciao

0 Kudos

Hi Matteo,

Are you running in a virtualized environment?

Regards,
Diane Szmurlo

Former Member
Former Member
0 Kudos

Following these two best practices documents might not be enough. Please take a look at following performance study:

Although this has been done on Linux and on Oracle as a database, the effects can be similar with other OS and DB.

I strongly recommend that you work through these two documents thoroughly:

Best Practices for Performance Tuning of Latency-Sensitive Workloads in vSphere VMs

Performance Best Practices for VMware vSphere 5.1

Or if you have vSphere 5.5:

Deploying Extremely Latency-Sensitive Applications in VMware vSphere 5.5

Matt

0 Kudos

Hi Matthias,

    

In the past 18 months, we’ve completely virtualized our
entire SAP landscape (ECC/EP/PI/BI) and all (DEV/QA/PRD) – all APP and DB
servers.  Our team followed all of the Best Practices for running SAP on
VMWARE documents.

    

We are now running into pockets of reported performance
issues in our ECC system – trying to find some measurement/metric which might truly “show” our
network latency issue (other than running a job on the DB and then running the
same on the app server)  Yes, we are finding that job x might take 400
seconds when running on the DB and then each app server (we have 5 of them)
varies in runtime from 1200 to 2000 seconds which is anywhere from 3x-to 5x
when running on the app servers. Although when running these tests on Sunday morning at 3:00 when system load is very low the runtimes drop to 2x compared to db.

    

Our ECC DB is UCS B200 M3 / ECC APP servers are B200 M1 /
Vsphere is 5.0 / SQL 2008 R2 DB.

In looking at our ECC early watch report; have noticed
ASYNC_NETWORK_IO wait time pre-virtualization same time last year was 30M ms
and is now 801M ms – I’m unable to find any specifics which states that this
could be due to a 3 tier network latency issue but feel that this is related.

We have not followed any of the Best Practices for
performance tuning of Latency-Sensitive workloads in vSphere VMs.  In
speaking with our network team; they are looking for more definitive need for
this to be appropriate and recommended for an SAP environment?  We have
NOT set the physical NIC to disable the interrupt moderation.  We have NOT
set the virtual NIC to disable the interrupt coalescing / nor disabled
LRO.  We have NOT reduced the idle-wakeup latencies (we are also running
vSphere 5.0 so not sure that we’d even have the monitor.idleLoopSpinBeforeHalt
and do know we don’t have the “magic” vSphere 5.5 set the Latency Sensitivity
to “High”)

    

How do we gather specific data points to show/indicate that
we do and/or should be making these changes?

   

Thank you in advance!

    

Regards

 

Doreen Anderson

Former Member
0 Kudos

Matteo,

Note 1056052 point #9 mentions a couple white papers - did you implement all of the changes as written in the Appendix of "Best Practices for Performance Tuning of Sensitive Workloads in vSphere Virtual Machines"?

Regards

Doreen Anderson

Former Member
0 Kudos

Hi Doreen,

thank you for your detailed description of the issue. Yes it seems to be network related, but of course nothing final can be said without seriously looking at your environment.

An easy latency test would be SAP NIPING (Note 500235). See that this test does not necessarily reflect the real life performance entirely, and therefore should just be seen as a quick indicator. Round-trip times can also be measured with other general available tools like netperf and iperf.

Without entering an elaborate data gathering process, I first would give these settings a try:

  1. Use vmxnet3 as your VM NIC and disable virtual interrupt coalescing (configuration made in the .vmx file of the VM, see the latency sensitive best practices).
  2. If this does not result in a major improvement, disable interrupt coalescing on the Cisco vNIC (not to be confused with the VM NIC). See that the vmxnet3 driver is also able to steer the underlying hardware, so disabling interrupt coalescing of the Cisco vNIC is usually not necessary.
  3. Before you touch monitor.idleLoopSpinBeforeHalt, upgrade ESXi to 5.1 U2 or 5.5. The new hypervisor has some scheduling enhancements which can also help with latency sensitive workloads.

If all this doesn't help, open a case at SAP. They'll dispatch to VMware and Cisco and can help with overall performance issues. If there's a case number already, pm me.

Kind regards,

Matt

Former Member
0 Kudos

Matt - Thank you for the very detailed approach ideas!!

Will update as we move forward with getting to the bottom of this

Regards

Doreen Anderson

Former Member
0 Kudos

Hi Matt,

In our ECC QA environment, we've tested out changing COALESCING on both the physical and virtual nic, power management, and LRO.  Making each of these changes one at a time and running SGEN and some key business transactions.

Looks like approximately 20-30% gain in timings / performance in QA and that is while not under any sort of load.

We are moving forward with these 5 changes in our PRD ECC environment this weekend and will update afterwards.

Thank you

Doreen

ps. we've now also started reviewing RSS receive side scaling on the VMXNET3 and TCP chimney config (KB942861) for SQL DB / SAP performance http://support.microsoft.com/KB/942861

Former Member
0 Kudos

Hi all,

on our Project, we have the same Performance issue. Our Testsystem has 4 Application Server, one is the Central Instance and this one is processing the Batch Jobs twice as fast as the other three.

All four Application Servers are clones from the same template, have the same Hardware (amount of CPUs, Type of CPU, RAM) and we checked the Network Connection between the Application Servers and the Database.

Was there any solution, to speed up the non-Central-Instance Application Servers?

Regards,

Edgar

0 Kudos

hi Edgar,

could you provide details on the virtualization solution, DB, OS version you use?

Thanks and best, Claudia

0 Kudos

Hi Maurizio,

Have you resolved this issue?  We are experiencing the same issue.

Regards,
Diane Szmurlo

former_member117942
Participant
0 Kudos

Hi Diane,

SAP replied to check for network issues, we did a complete check but our network (10gb) works fine.

We have a EHP6 on WIN/DB2, do you have also this landscape?

We have responded to SAP yesterday and we are waiting for more suggestions.

Regards,

Maurizio

0 Kudos

Hi Maurizio,

We are on ECC 6.0 sp 23 (no EHP)  Kernel 7.20 EXT patch 417 with DB running on CI and 5 APP Servers.  In August we upgraded our DB to SQL Server 2008 R2 and virtualized.

We can clearly show through traces that running a process on an app server is 5 times slower than running it on the DB.  We recently conducted a Technical Performance Optimization DB Session with SAP and brought this issue up.  They identified the difference is in the OPEN operation and suggested we enter a message under component BC-DB-DBI which I have done but they have not picked up the message.

We are also researching deactivating the Interrupt Coalescing in VMWare Host.  We did this for one application server in our QA system but the performance did not change.  We are not sure that this means it wont help in production because we dont have the load in QA that is in Prod.

FInally we will be doing a complete analysis on VMWare settings similar to the TPO done by SAP.

Please keep me updated as to SAP's response and I will do the same for you when we find a resolution.

Regards,
Diane Szmurlo

0 Kudos

One more question - What service pack of SQL Server 2008 R2 do you have applied?  We only have SP1.

former_member117942
Participant
0 Kudos

Hi,

we are on IBM DD2 database for Windows (also konwn as DB6) not on MsSql.

Regards,

Maurizio

former_member117942
Participant
0 Kudos

Errata Corrige:

Hi,

we are on IBM DB2 database for Windows (also konwn as DB6) not on MsSql.

Regards,

Maurizio

Former Member
0 Kudos

I think the bottleneck is between the application server and database. Probably incorrect configuration. Although you say that network problems have been ruled out, I would still make sure. You can for example run a network performance metering tool between the application server and database server and compare the results with respective measurements between the central instance and database server.

former_member117942
Participant
0 Kudos

Hi,

Network bandwidth is 10 gigabit between central instance and application server.

I've tried niping for an hour and connection was fine; no loss of packets.

Network utilization is about 1-2%, so very low.

I've no idea why db time for job running on application server are so high; this is the real problem.

Sap parameters are the same; kernel si the same.

Could be a bug of kernel or dbsl lib ?

Thanks.

Maurizio