[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Possible fix for the signal-handling problem in es-0.9-beta1.tar.gz
Loren wrote:
> It is always great to hear from the author of the software that they
> still run it. ;-)
I guess that wasn't exactly the confidence-boosting statement I intended
it to be.
> > but I've been meaning to fix the damn control-C on Linux bug for a
> > long time.
>
> OK, the signal-handling code for interactive use could still use some
> minor tuning. ;-)
To put it mildly.
> Of course, I could never do the amount of work on es that you did, but
> as a small token of my gratitude, I have finally attempted to debug
> this problem in earnest and produce a patch for it. It works for me
> on FreeBSD, OSF1 and Solaris (sorry, can't test Linux at the moment).
Unfortunately, it doesn't appear to fix the bug I've been seeing on
Linux.
> If you are referring to the same long-standing bug that I am aware of
> (and that I can reproduce on Linux, Solaris, FreeBSD and OSF1), then
> it is not Linux-specific. I assume that Linux also configures to
> HAVE_SIGACTION=1
It does.
> (even if it doesn't, I have also found an unrelated regression that
> affects platforms that configure to HAVE_SIGACTION=0, SYSV_SIGNALS=1).
> If so, then you may enjoy this analysis and possible patch.
>
> Below is a patch that was found tonight by manual inspection after
> getting a handle on the failure mode. I found that the file-scoped
> variable ``blocked'' in signal.c became non-zero forever after the
> first interrupt was processed inside a shell-level while loop. Thus,
> clearly, an imbalance in calls to blocksignals()/unblocksignals() must
> be present in some code path.
That makes a lot of sense as an explanation. However, I still get the
symptoms, which is an unexpected ``wait: No child processes'' on the
first interrupt, and no effect from the second interrupt.
> I hope I am not just trading the obvious fix of the call pairing for a
> race condition in some other case (I figure that you can judge this
> far faster than I).
Alas, it's been so long since I looked at es's signal handling, I have
no sense of what is supposed to happen.
> Although perhaps benign whenever HAVE_SIGACTION=1, I also removed the
> gratuitous signal() call in catcher() when SYSV_SIGNALS=0 and,
> conversely, added the needed signal() call in catcher() when
> SYSV_SIGNALS=1. According to my local CVS archive of es, someone
> clearly hosed this between ES-0_9-ALPHA1 and ES-0_9-BETA1.
Interesting. No memory at all of what that code does.
Harald wrote
> (Actually, I recruited yet another es user recently, perhaps in the
> process doubling the number of es users in Norway.)
(Does that leave Norway with the highest per-capita es usage?)
> I wonder which 0.9-beta1 version you have patched, though: I think
> there are two of them out there.
Oh, that's scary.
Perhaps after three years of our rigorous testing we should pull it out
of beta and declare es-0.9 ready for usa? Our fans have been waiting.
:-)
--p