[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

spurious ``wait: No child processes'' messages



I wrote:
> ... though es still sometimes reports ``wait: No child processes''
> when processes are interrupted.  No idea what that happens.

Ok, I have a bit of a clue, which leaves me wondering why this ever
worked before.

The symptom is that es prints ``wait: No child processes'', even though
there everybody knows there was a process to wait on.

The key routine is:

	/* dowait -- a wait wrapper that interfaces with signals */
	static int dowait(int *statusp) {
		int n;
		interrupted = FALSE;
		if (!setjmp(slowlabel)) {
			slow = TRUE;
			n = interrupted ? -2 :
	#if HAVE_WAIT3
				wait3((void *) statusp, 0, &wait_rusage);
	#else
				wait((void *) statusp);
	#endif
		} else
			n = -2;
		slow = FALSE;
		if (n == -2) {
			errno = EINTR;
			n = -1;
		}
		return n;
	}

This time, strace was quite helpful.  Basically, what seems to happen is
that wait() returns -- in this case, wait4, which is invoked by wait3 --
but before the result gets propagated anywhere useful, the signal comes
in.  So, the process has been waited for, but we never do anything
useful with the status.

Judging from the code, the ``if (slow)'' mechanism is intended to deal
with this case, but there's a race condition it's facing -- if wait
returns before the global slow gets set to false, data is lost.  In this
case, the what's lost is our memory of waiting for the process to exit.

Anyone got any ideas how to fix this?  Or why it worked in the first
place?

--p