[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: /proc/loadavg Misleading on 2.2.14-12.
Hi Erik,
> > However, when the machine gets really loaded, and vmstat's idle
> > column is zero for most of an hour I find /proc/loadavg reports a
> > misleadingly low figure for the last minute's average number of
> > processes in the run queue.
>
> Which means that most of the processes on your machine are NOT in the
> run queue, but are waiting for I/O or are swapped in or out. IMHO I/O
> is quite a normal situation for a web server; swapping is not.
If that was the case wouldn't I see some indication in vmstat's `b'
column? I saw nearly constant zeroes in that column (as is always the
case), idle time was nearly always zero.. Besides, the web server
access log suggest that it was *very* busy, more so than normal.
(There was no swapping; si and so were 0.)
This is the number of hits per hour. The load averages were OK until
about 11:00 when they dropped down and appeared too low compared to the
work being done. For example, 1.36 compared with the 8 or higher that
this number of hits normally gives. And the nature of the requests was
no different to normal. At 14:21 the machine crashed. Nothing to
indicate why in the log files.
10 98141
11 100755
12 94966
13 117973
14 41248 21 mins
15 52776 26 mins
16 104627
17 84583
18 65878
> [Very helpful explanation of load average calculation snipped.]
>
> Going back to your question: no it doesn't overflow unless you push
> your web server to have more than 1029 active tasks, which is very
> unlikely because the machine will be brought to a crawling halt
> because of memory problems.
Hmm. Active tasks at that one instant in time when the list is
checked. No, I doubt it. At peak loads there are probably 200 httpds,
400 mysqlds, plus misc. other stuff, i.e. the maximum application
settings are reached.
> > I'm also puzzled as to the addition of 0.05 in
> >
> > http://lxr.linux.no/source/fs/proc/proc_misc.c#L86
> >
> > Why? Is it a fudge factor? What's it for?
>
> It's a standard trick for proper rounding in an integer or fixed
> point situation (actually, it's 1/200 = 0.005).
Yes, of course. I should have seen that.
The machine crashed after reporting these weird figures for a while.
Could a hardware fault, like over-heating cause this kind of effect?
It's been rock-solid, including under high load, just not quite this
high, for the past many months. No changes made to the system;
IINBDFI.
Investigating the cause is what led me to look at the strange low load
average figures.
Thanks,
Ralph.
-
Kernelnewbies: Help each other learn about the Linux kernel.
Archive: http://mail.nl.linux.org/
IRC Channel: irc.openprojects.net / #kernelnewbies
Web Page: http://www.kernelnewbies.org/