cancel
Showing results for 
Search instead for 
Did you mean: 

When extract files on NAS storage, other operations slow down significantly

jin-up_shim
Explorer
0 Kudos

http://me2.do/5toO17tq

Hi

One of our customers have an performance issue on extracting big files ( about bigger than 20 Giga bytes ) on NAS storage.

They are not using NAS storage for dbspaces. They only use it for extracting files with temp_extract_name1 option.

They know using NAS storage gives them poor performance too.

They don't expect to extract files more rapidly, but extracting files affect on performance of other operations.

In spite of only one additional extracting operation compared to normal, other operations take 5 ~ 10 times longer.


I have two sp_iqsysmon's results and two nmon's monitoring reports.

One is for an issued day(4/23), the other is for a normal day(4/24)

From 6 P.M. to 8 P.M.

Click a headline URL to download files.

Which is responsible for this issue? IQ, OS or Storage?

Thank you.

Accepted Solutions (0)

Answers (2)

Answers (2)

Gisung
Advisor
Advisor
0 Kudos

Hi,

Have you ever tested with named pipe when extracting files?

Please let me know it it's same as before.

==

Gi-Sung Jang

jin-up_shim
Explorer
0 Kudos

Hi,

I didn''t tried to with named pipe.

But I tried to on SAN storage. It didn't make an issue.

Thank you.

markmumy
Advisor
Advisor
0 Kudos

I wouldn't expect a named pipe to increase performance given how small a buffer it has and how it generally stops and starts the data movement process.

Based on what you describe, Jason's comments are likely spot on.  When IQ is writing out to slow storage, OS level issues creep in.  IQ will open up the output file(s) for write.  During that process, the OS will handle all IO.  When the OS has to wait on storage, it goes into a wait state.  On some systems, that 'wait' may block the CPU core from doing any other processing.  This would then manifest itself in an IQ system that was able to use fewer resources because those CPU resources are all waiting on storage now.

Mark

jin-up_shim
Explorer
0 Kudos

You said, on some systems, the 'wait' may block the CPU core from doing any other processing.

How about on AIX?

Their IQ server's on AIX 6.1.4.3.

I have read some articles that even if cpu resources are in wait state, they can be used by any other process's request on AIX.

If so, in this case, why couldn't IQ use the waiting cpu resources?

Was it because the process (extract sql) causing wait and other requests( other sqls ) were runned by same process, iqsrv15?

markmumy
Advisor
Advisor
0 Kudos

The issue is completely independent of IQ.  The multithreaded cores that are in the market today (p-series, Intel, SunOracle, etc) typically leverage the wait states for compute resources.  If a thread is waiting on IO, it will put another thread on the core for CPU processing.  Should that thread need IO, it will wait.  You will only get so many processor threads that can wait.  On IBM p-series it is 2, 4, or 8 threads per core (SMT2, SMT4, or SMT8) depending on the machine and chip type.

The one thing that no hardware will do is have a near infinite number of threads waiting on IO.  With the slowdown in extractions it will ripple into all other IQ activity because the hardware (this is way below the OS even) only has so many IO waiting slots. As you use those, the wait time increases.  And will continue to increase until you have no more IO wait slots and the CPUs are spinning constantly checking on IOs.

OK, that's a gross generalization, but it goes to illustrate the point that if you have slow enough storage it will ripple into every process on the  host, even those that should be on the CPU all the time.

I would look tp having a dedicate network for NAS, use another network protocol that is more efficient for large IO, use another filesystem that will do larger IO, extract data locally then copy it to the NAS.  Something that allows IQ to continue processing efficiently.

Mark

Former Member
0 Kudos

It's really not a matter of which is responsible.  They all are.

What is the connection speed, latency, etc from the server machine to the NAS? How is it connected to the NAS?  NFS?  CIFS?  iSCSI? 

What is the maximum sustained rate at which the NAS can write to the disks?

Which platform is the IQ server on?  Have you tuned the os, network and any file system options? If so, how?

Have you tuned the NAS?  If so, how?

Are you current on the patches/firmware for all components?

jason

markmumy
Advisor
Advisor
0 Kudos

Adding to this....  If you want to see how much of an impact NAS has on your system and the process, change the output name of the TEMP_EXTRACT_NAMEx options to /dev/null, then rerun your extract.

Also, is NAS using the same network that the other users are?  That could be an issue since there is a finite bandwidth to share.

What do the extractions look like?  Are there where clauses and search arguments?  Joins?  These will impact the degree of parallelism that the extraction can use to resolve the rows.  You could end up with a high degree of parallelism on your extracts that is taking resources from other jobs and users.

Mark

Former Member
0 Kudos

I would opt for writing to a local file system instead of /dev/null because it is a better representation of how much writing to the NAS system impacts performance on the IQ system.

On some NAS systems, performance will go down as the file(s) become larger.  a dd to the NAS system from the IQ box (and other boxes) of similar sized files may give a little more indication as to where the bottleneck may be.   Be warned that depending on the storage mechanism used by the NAS, reading from /dev/zero will be masked since block 8000 is the same as every other block so it may not actually write all blocks to physical disk.

On a side note, benchmarking tools may be worthwhile to run against the NAS system to obtain the optimal block size, packet size, etc.  There are dozens both commercial and open source.  One example would be Bonnie++ (http://en.wikipedia.org/wiki/Bonnie%2B%2B). 

jason

jin-up_shim
Explorer
0 Kudos

Thanks for reply, Jason

Regardless the connection speed, latency, etc of NAS and how they connect to their NAS, the fact is their NAS system is slow.

In fact, Why their NAS system is slow is not my concern.

My Major concern on this issue is how extracting operation on slow NAS can infulence other IQ operations's performance except simple infulence as an additional sql

For example, as follows.

Slow I/O of NAS system increases cpu's wait time and other operations cannot use cpu as much as normal situation. That's why other operations are getting slow.

Actually, When extracted a file on NAS, cpu wait and sys usage were high and user usage was low. And disk(SAN Storage for dbspaces) read/write decreased

It's just one example. Actually, another reason may cause this issue.

Tuning the os, network and any file system options is next step.

Thank you

jin-up_shim
Explorer
0 Kudos

Thanks for reply, Mark.

When I tried to extract the file on SAN Storage instead of /dev/null, no issue was occurred.

The extract query was very simple.

It has just one where condition and no join as follows.

select * from kfb_gct_mmly_dat

where bse_yymm like '2013%'

Like you said, their NAS is using the same network that other users are. But, lt looks not a issue.

Because, other operations that were slowed were server-side operations such as insert ~ select~, delete, etc

Please read my answer to the Jason.

Thank you.