[Oracle] Tuning "log file syncs" on AIX with DIO/C...

stefan_koehler · ‎11-30-2008

A short introduction

A new file system feature called "Concurrent I/O" (CIO) was introduced in the Enhanced Journaling File System (JFS2) in AIX 5L™ version 5.2.0.10, also known as maintenance level 01 (announced May 27, 2003). This new feature improves performance for many environments, particularly commercial relational databases.

If you are using Oracle 10g on JFS2 filesystems with the parameter FILESYSTEMIO_OPTIONS set to "SETALL" like recommended by SAP (Sapnote #830576) - CIO is already used automatically.

If you want to know more about async I/O, direct I/O or concurrent I/O, please check the links in the references. There is a link to a white paper published by IBM.

Filesystems and its options

In this blog post i will only focus on the filesystems for the online redolog files. The filesystems origlogA, origlogB, mirrlogA and mirrlogB are the import ones in a sap standard environment.

The created logical volumes and its filesystems in my test environment:

shell> lsfs -cq /oracle/<SID>/origlogA
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/origlogA:/dev/lvorigA_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
(lv size 3932160:fs size 3932160:block size 4096:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/origlogB
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/origlogB:/dev/lvorigB_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
(lv size 3932160:fs size 3932160:block size 4096:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/mirrlogA
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/mirrlogA:/dev/lvmirrA_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
(lv size 3932160:fs size 3932160:block size 4096:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/mirrlogB
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/mirrlogB:/dev/lvmirrB_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
(lv size 3932160:fs size 3932160:block size 4096:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)

As you can see the filesystems are created with the standard block size of 4096 bytes and the filesystems are mounted with the CIO option.

Lets check/compare this with the access method of oracle:

SQL> show parameter filesystemio_options
NAME TYPE VALUE
------------------------------------ ----------- ---------
filesystemio_options string SETALL

shell> lsof +fg /oracle/<SID>/origlogB
COMMAND     PID   USER   FD   TYPE          FILE-FLAG DEVICE SIZE/OFF NODE NAME
oracle 1224804 <SID>adm   20u VREG R,W,CIO,DSYN,LG;CX   37,7 943718912    4 /oracle/<SID>/origlogB (/dev/lvorigB_<SID>)
oracle 1224804 <SID>adm   24u VREG R,W,CIO,DSYN,LG;CX   37,7 943718912    5 /oracle/<SID>/origlogB (/dev/lvorigB_<SID>)

The crosscheck between the oracle setting and the access method to the online redolog files fits.

The demoted I/O

So what can be "wrong" now with the configuration / setting above. To get an understanding of the problem you need to know that the oracle redo information is written in 512 byte blocks. So if you are using direct I/O or concurrent I/O, the JFS2 blocksize must fit to the requested I/Os to avoid demoted I/O. IBM describes demoted I/O as "Return to normal I/O after a direct I/O failure".

So now lets check if our system has some demoted I/Os:

shell> trace -aj 59B,59C
shell> trcstop
shell> trcrpt -o demoted_io.check
shell> grep demoted demoted_io.check
59B    0.001330218       0.015755                   JFS2 IO dio demoted: vp = F10001006279B7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
59B    0.001411175       0.018402                   JFS2 IO dio demoted: vp = F1000100627AB7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
59B    0.064204179       0.008152                   JFS2 IO dio demoted: vp = F10001006279B7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
...
...
59B    0.985171921       0.001468                   JFS2 IO dio demoted: vp = F1000100627AB7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
59B    1.005359694       0.030008                   JFS2 IO dio demoted: vp = F1000100627AB7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
59B    1.017411856       0.011505                   JFS2 IO dio demoted: vp = F10001006279B7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000

This was a trace with round about 3 seconds - you can see that you have some demoted I/O calls in your system.

The new filesystems

To avoid demoted I/O calls you need to create a JFS2 filesystem with a blocksize of 512 bytes. Just keep in mind that this is only necessary for the online redolog filesystems. On my test system i shutdown the database, moved the online redologs to another filesystem, deleted the old redolog filesystems and moved the online redolog files back. Of course you can do that also online by adding additional redolog groups.

The new created logical volumes and its filesystems in my test environment:

shell> lsfs -cq /oracle/<SID>/origlogA
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/origlogA:/dev/lvorigA_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
(lv size 3932160:fs size 3932160:block size 512:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/origlogB
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/origlogB:/dev/lvorigB_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
(lv size 3932160:fs size 3932160:block size 512:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/mirrlogA
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/mirrlogA:/dev/lvmirrA_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
(lv size 3932160:fs size 3932160:block size 512:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/mirrlogB
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/mirrlogB:/dev/lvmirrB_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
(lv size 3932160:fs size 3932160:block size 512:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)

Compare the results

The results are compared on an oracle database 10.2.0.2.0 and AIX 5300-06-03-0732 with SAN disks on an IBM DS8000.

The test scenario

I have written a PL/SQL script that makes parallel inserts into 10 different test tables (2.000.000 datasets per table) and a commit after 2 datasets per table. This parallel load is much more as you will face it in your productive environment, but with this simulation you can see the difference very well. I take an AWR snapshot before and after the load simulation - each snapshot is round about 5 minutes. I simulate this scenario three times to get more values that i can compare.

With JFS2 blocksize of 4096

Log file sync

Runs	Total waits	Total wait time in s	Average wait in ms
First	2,023	335	166
Second	3,555	186	52
Third	4,194	274	65

Log file parallel write

Runs	Total Waits	Total wait time in s	Average wait in ms
First	8,085	253	31
Second	10,272	232	23
Third	15,189	235	15

With JFS2 blocksize of 512

Log file sync

Runs	Total waits	Total wait time in s	Average wait in ms
First	4,438	8	2
Second	12,563	25	2
Third	4,171	10	2

Log file parallel write

Runs	Total Waits	Total wait time in s	Average wait in ms
First	178,120	204	1
Second	175,998	203	1
Third	164,397	202	1

The performance of "log file syncs" and "log file parallel writes" increases drastically.

In a normal environment the performance variability of log file syncs / log file parallel writes will be eliminated and the values will be static (except hardware or OS problems).