cancel
Showing results for 
Search instead for 
Did you mean: 

SLES 12 and "load avg." VERY high

markus_doehr2
Active Contributor
0 Kudos

I notice on a system with SLES 12 that the "load average" is very high:

top - 10:17:13 up 1 day, 17:26,  1 user,  load average: 1251.61, 999.30, 752.61

Tasks: 3324 total,   1 running, 3322 sleeping,   0 stopped,   1 zombie

%Cpu(s):  5.8 us,  5.5 sy,  0.0 ni, 58.3 id, 29.8 wa,  0.0 hi,  0.6 si,  0.0 st

KiB Mem:  33012816 total, 32728924 used,   283892 free,    81932 buffers

KiB Swap: 31456252 total,    77724 used, 31378528 free. 27046296 cached Mem

There are almost 3000 'kworker' kernel threads forked/running:

# ps -ef | grep kworker | wc -l

2914

The machine is a box with 20 CPUs and is currently under load but that seems a bit too much of it.The box is responsive et al, it's just that a migration to SLES 12 implies our monitoring to be completely reconfigured then.

Is that considered "normal"?

--

Markus

Accepted Solutions (1)

Accepted Solutions (1)

Former Member
0 Kudos

Hi Markus,

Great question...I don't have a SLES 12 system to look at but couldn't reproduce the issue on my SLES 11 SP3 box.

My Linux Kernel version = 3.0.76


Just curious what your Kernel version is?

As you probably already know the loadavg as shown by top is derived from /proc/loadavg

/proc/loadavg is from the loadavg.c code in the Kernel Source (i.e. /usr/src/linux/fs/proc/loadavg.c)

The section of code that displays the loadavg.c is:

static int loadavg_proc_show(struct seq_file *m, void *v)

{     

unsigned long avnrun[3];     

get_avenrun(avnrun, FIXED_1/200, 0);     

seq_printf(m, "%lu.%02lu %lu.%02lu %lu.%02lu %ld/%d %d\n",             

LOAD_INT(avnrun[0]), LOAD_FRAC(avnrun[0]),             

LOAD_INT(avnrun[1]), LOAD_FRAC(avnrun[1]),             

LOAD_INT(avnrun[2]), LOAD_FRAC(avnrun[2]),             

nr_running(), nr_threads,             

task_active_pid_ns(current)->last_pid);     

return 0; }


It might be just a display issue...

As I learned when digging around in the Kernel code there is a lot of number conversions happening with the loadavg.

If your loadavg.c section looks the same as mine...then I might suspect the calc_load function.

This could be very well a specific issue to your SLES 12 Kernel version.

markus_doehr2
Active Contributor
0 Kudos

SLES 12 is

3.12.44-52.10-default

I think the explanation for that is here:

https://raw.githubusercontent.com/torvalds/linux/master/Documentation/workqueue.txt

I just wonder whether 3000+ kernel threads under a moderate load is fine. If that is the case (which it seems to be) we have to adapt our monitoring accordingly and won't be able to use the load avg. value any more 🙂

--

Markus

Former Member
0 Kudos

Thanks for the link....just curious...

What does a vmstat 5 5 show when the loadavg is so high?

Do you see many processes on the run queue?

From your top output it looks like the loadavg should be around 1 not 1251 (which is why I thought it might have been a printing mistake at first)...

A high number of sleeping processes should not increase your loadavg...unless the kworker threads are doing some type of I/O and not really sleeping...

Thanks...

Answers (0)