Technology Blogs by Members
Explore a vibrant mix of technical expertise, industry insights, and tech buzz in member blogs covering SAP products, technology, and events. Get in the mix!
cancel
Showing results for 
Search instead for 
Did you mean: 
stefan_koehler
Active Contributor
0 Kudos

A short introduction

A new file system feature called "Concurrent I/O" (CIO) was introduced in the Enhanced Journaling File System (JFS2) in AIX 5L™ version 5.2.0.10, also known as maintenance level 01 (announced May 27, 2003). This new feature improves performance for many environments, particularly commercial relational databases.

If you are using Oracle 10g on JFS2 filesystems with the parameter FILESYSTEMIO_OPTIONS set to "SETALL" like recommended by SAP (Sapnote #830576) - CIO is already used automatically.

If you want to know more about async I/O, direct I/O or concurrent I/O, please check the links in the references. There is a link to a white paper published by IBM.

Filesystems and its options

In this blog post i will only focus on the filesystems for the online redolog files. The filesystems origlogA, origlogB, mirrlogA and mirrlogB are the import ones in a sap standard environment.

The created logical volumes and its filesystems in my test environment:

shell> lsfs -cq /oracle/<SID>/origlogA
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/origlogA:/dev/lvorigA_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
  (lv size 3932160:fs size 3932160:block size 4096:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/origlogB
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/origlogB:/dev/lvorigB_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
  (lv size 3932160:fs size 3932160:block size 4096:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/mirrlogA
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/mirrlogA:/dev/lvmirrA_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
  (lv size 3932160:fs size 3932160:block size 4096:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/mirrlogB
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/mirrlogB:/dev/lvmirrB_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
  (lv size 3932160:fs size 3932160:block size 4096:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)

As you can see the filesystems are created with the standard block size of 4096 bytes and the filesystems are mounted with the CIO option.

Lets check/compare this with the access method of oracle:

SQL> show parameter filesystemio_options
NAME                                 TYPE        VALUE
------------------------------------ ----------- ---------
filesystemio_options                 string      SETALL

shell> lsof +fg /oracle/<SID>/origlogB
COMMAND     PID   USER   FD   TYPE          FILE-FLAG DEVICE  SIZE/OFF NODE NAME
oracle  1224804 <SID>adm   20u  VREG R,W,CIO,DSYN,LG;CX   37,7 943718912    4 /oracle/<SID>/origlogB (/dev/lvorigB_<SID>)
oracle  1224804 <SID>adm   24u  VREG R,W,CIO,DSYN,LG;CX   37,7 943718912    5 /oracle/<SID>/origlogB (/dev/lvorigB_<SID>)

The crosscheck between the oracle setting and the access method to the online redolog files fits.

The demoted I/O

So what can be "wrong" now with the configuration / setting above. To get an understanding of the problem you need to know that the oracle redo information is written in 512 byte blocks. So if you are using direct I/O or concurrent I/O, the JFS2 blocksize must fit to the requested I/Os to avoid demoted I/O. IBM describes demoted I/O as "Return to normal I/O after a direct I/O failure".

So now lets check if our system has some demoted I/Os:

shell> trace -aj 59B,59C
shell> trcstop
shell> trcrpt -o demoted_io.check
shell> grep demoted demoted_io.check
59B    0.001330218       0.015755                   JFS2 IO dio demoted: vp = F10001006279B7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
59B    0.001411175       0.018402                   JFS2 IO dio demoted: vp = F1000100627AB7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
59B    0.064204179       0.008152                   JFS2 IO dio demoted: vp = F10001006279B7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
...
...
59B    0.985171921       0.001468                   JFS2 IO dio demoted: vp = F1000100627AB7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
59B    1.005359694       0.030008                   JFS2 IO dio demoted: vp = F1000100627AB7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000
59B    1.017411856       0.011505                   JFS2 IO dio demoted: vp = F10001006279B7F8, mode = 0001, bad = 0002, rc = 0000, rc2 = 0000

This was a trace with round about 3 seconds - you can see that you have some demoted I/O calls in your system.

The new filesystems

To avoid demoted I/O calls you need to create a JFS2 filesystem with a blocksize of 512 bytes. Just keep in mind that this is only necessary for the online redolog filesystems. On my test system i shutdown the database, moved the online redologs to another filesystem, deleted the old redolog filesystems and moved the online redolog files back. Of course you can do that also online by adding additional redolog groups.

The new created logical volumes and its filesystems in my test environment:

shell> lsfs -cq /oracle/<SID>/origlogA
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/origlogA:/dev/lvorigA_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
  (lv size 3932160:fs size 3932160:block size 512:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/origlogB
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/origlogB:/dev/lvorigB_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
  (lv size 3932160:fs size 3932160:block size 512:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/mirrlogA
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/mirrlogA:/dev/lvmirrA_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
  (lv size 3932160:fs size 3932160:block size 512:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)
shell> lsfs -cq /oracle/<SID>/mirrlogB
#MountPoint:Device:Vfs:Nodename:Type:Size:Options:AutoMount:Acct
/oracle/<SID>/mirrlogB:/dev/lvmirrB_<SID>:jfs2::<SID>:3932160:cio,rw:no:no
  (lv size 3932160:fs size 3932160:block size 512:sparse files yes:inline log no:inline log size 0:EAformat v1:Quota no:DMAPI no:VIX no)

Compare the results

The results are compared on an oracle database 10.2.0.2.0 and AIX 5300-06-03-0732 with SAN disks on an IBM DS8000.

The test scenario

I have written a PL/SQL script that makes parallel inserts into 10 different test tables (2.000.000 datasets per table) and a commit after 2 datasets per table. This parallel load is much more as you will face it in your productive environment, but with this simulation you can see the difference very well. I take an AWR snapshot before and after the load simulation - each snapshot is round about 5 minutes. I simulate this scenario three times to get more values that i can compare.

With JFS2 blocksize of 4096

Log file sync

Runs Total waits Total wait time in s Average wait in ms
First2,023335 166
Second 3,555186 52
Third 4,194274 65

Log file parallel write

Runs Total Waits Total wait time in s Average wait in ms
First 8,085253 31
Second  10,272232 23
Third  15,189235 15

With JFS2 blocksize of 512

Log file sync

Runs Total waits Total wait time in s Average wait in ms
First 4,438 8 2
Second 12,563 25 2
Third  4,171 10 2

Log file parallel write

Runs Total Waits Total wait time in s Average wait in ms
First 178,120 204 1
Second  175,998 203 1
Third  164,397 202 1


The performance of "log file syncs" and "log file parallel writes" increases drastically.

In a normal environment the performance variability of log file syncs / log file parallel writes will be eliminated and the values will be static (except hardware or OS problems).

References

3 Comments
Labels in this area