[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: update re: fork() failures in 2.1.103



RECAP: In 2.1.99, 2.1.101, 2.1.103, and 2.1.104-pre1, my system has been
usable for only ~1 day with 32 MB of memory, or ~2.5 days with 48 MB.
Then my system has trouble forking, typically with EAGAIN.  The situation
can be alleviated temporarily by killing off a few processes, but the
errors always reappear soon thereafter.  I have sent in the results of
Shift-ScrollLock, which Rik thinks are not typical of excessive memory
fragmentation.

Now, I have scripts that run "ifconfig ppp0" hourly (to check whether PPP
is "UP").  Recently I joined the modern era by changing from net-tools
1.432 to 1.45.  The forking errors have gone away (at least for uptimes
twice the above).  When I changed these scripts to run "/sbin/ifconfig.old
ppp0" instead, they came back.

Running the old ifconfig (when the problem arises) would put "kmod: fork
failed, errno 11" messages in the logfiles.  The new ifconfig doesn't.
Running strace on "ifconfig ppp0" shows that the old version makes the
following system calls that the new one doesn't:

> socket(PF_??? (0x4), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x4), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x4), SOCK_DGRAM, , 0)   = -1 EINVAL (Invalid argument)
> socket(PF_??? (0x3), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x3), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x3), SOCK_DGRAM, , 0)   = -1 EINVAL (Invalid argument) 
> socket(PF_??? (0x5), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x5), SOCK_DGRAM, , 0)   = -1 ENOSYS (Function not implemented)
> socket(PF_??? (0x5), SOCK_DGRAM, , 0)   = -1 EINVAL (Invalid argument)

(I am not sure whether these system calls have been taken out of the 
new ifconfig, or whether I merely configured net-tools to be ignorant
of appletalk, etc.)

Something about my old ifconfig must be triggering a bug (or hardware
error?) somewhere.  I am willing to take further suggestions for
experiments to try, if anyone is still interested.

	-Paul <kimoto@lightlink.com>
	 (please cc: relevant messages to me)