[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Possible fix for the signal-handling problem in es-0.9-beta1.tar.gz



Loren wrote:

> It is always great to hear from the author of the software that they
> still run it. ;-)

I guess that wasn't exactly the confidence-boosting statement I intended
it to be.

> > but I've been meaning to fix the damn control-C on Linux bug for a
> > long time.
> 
> OK, the signal-handling code for interactive use could still use some
> minor tuning. ;-)

To put it mildly.

> Of course, I could never do the amount of work on es that you did, but
> as a small token of my gratitude, I have finally attempted to debug
> this problem in earnest and produce a patch for it.  It works for me
> on FreeBSD, OSF1 and Solaris (sorry, can't test Linux at the moment).

Unfortunately, it doesn't appear to fix the bug I've been seeing on
Linux.

> If you are referring to the same long-standing bug that I am aware of
> (and that I can reproduce on Linux, Solaris, FreeBSD and OSF1), then
> it is not Linux-specific.  I assume that Linux also configures to
> HAVE_SIGACTION=1

It does.

> (even if it doesn't, I have also found an unrelated regression that
> affects platforms that configure to HAVE_SIGACTION=0, SYSV_SIGNALS=1).
> If so, then you may enjoy this analysis and possible patch.
> 
> Below is a patch that was found tonight by manual inspection after
> getting a handle on the failure mode.  I found that the file-scoped
> variable ``blocked'' in signal.c became non-zero forever after the
> first interrupt was processed inside a shell-level while loop.  Thus,
> clearly, an imbalance in calls to blocksignals()/unblocksignals() must
> be present in some code path.

That makes a lot of sense as an explanation.  However, I still get the
symptoms, which is an unexpected ``wait: No child processes'' on the
first interrupt, and no effect from the second interrupt.

> I hope I am not just trading the obvious fix of the call pairing for a
> race condition in some other case (I figure that you can judge this
> far faster than I).

Alas, it's been so long since I looked at es's signal handling, I have
no sense of what is supposed to happen.

> Although perhaps benign whenever HAVE_SIGACTION=1, I also removed the
> gratuitous signal() call in catcher() when SYSV_SIGNALS=0 and,
> conversely, added the needed signal() call in catcher() when
> SYSV_SIGNALS=1.  According to my local CVS archive of es, someone
> clearly hosed this between ES-0_9-ALPHA1 and ES-0_9-BETA1.

Interesting.  No memory at all of what that code does.


Harald wrote

> (Actually, I recruited yet another es user recently, perhaps in the
> process doubling the number of es users in Norway.)

(Does that leave Norway with the highest per-capita es usage?)

> I wonder which 0.9-beta1 version you have patched, though:  I think
> there are two of them out there.

Oh, that's scary.

Perhaps after three years of our rigorous testing we should pull it out
of beta and declare es-0.9 ready for usa?  Our fans have been waiting.

:-)

--p