[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Possible fix for the signal-handling problem in es-0.9-beta1.tar.gz



In article <cWdHfF5Et0@iadd.jivetech.com>,
Paul Haahr <haahr@jivetech.com> writes:

>> I have no idea if anyone besides me still runs es (or anyone cares to
>> maintain their copies),

> Well, I still run it.

Hi Paul,

[Since the es list appears to insert a rather large time delay, I have
 explicitly CC'd parties that were known to be interested in this
 problem back in January 2000.]

It is always great to hear from the author of the software that they
still run it. ;-)

> Hasn't needed much maintenance lately,

How true: The only reason I found the minor memory-handling bug is
that I upgraded my operating system recently.  The new version
(FreeBSD 4.2) checks for that exact type of memory handling error and
many others (by default).

IIRC, I seem to have hit this problem before but the crash/failure was
not always so predictable.  At least, I am quite sure I was never able
to pinpoint the errant code.

> but I've been meaning to fix the damn control-C on Linux bug for a
> long time.

OK, the signal-handling code for interactive use could still use some
minor tuning. ;-)

Of course, I could never do the amount of work on es that you did, but
as a small token of my gratitude, I have finally attempted to debug
this problem in earnest and produce a patch for it.  It works for me
on FreeBSD, OSF1 and Solaris (sorry, can't test Linux at the moment).

If you are referring to the same long-standing bug that I am aware of
(and that I can reproduce on Linux, Solaris, FreeBSD and OSF1), then
it is not Linux-specific.  I assume that Linux also configures to
HAVE_SIGACTION=1 (even if it doesn't, I have also found an unrelated
regression that affects platforms that configure to HAVE_SIGACTION=0,
SYSV_SIGNALS=1).  If so, then you may enjoy this analysis and possible
patch.

Below is a patch that was found tonight by manual inspection after
getting a handle on the failure mode.  I found that the file-scoped
variable ``blocked'' in signal.c became non-zero forever after the
first interrupt was processed inside a shell-level while loop.  Thus,
clearly, an imbalance in calls to blocksignals()/unblocksignals() must
be present in some code path.  I hope I am not just trading the
obvious fix of the call pairing for a race condition in some other
case (I figure that you can judge this far faster than I).

Although perhaps benign whenever HAVE_SIGACTION=1, I also removed the
gratuitous signal() call in catcher() when SYSV_SIGNALS=0 and,
conversely, added the needed signal() call in catcher() when
SYSV_SIGNALS=1.  According to my local CVS archive of es, someone
clearly hosed this between ES-0_9-ALPHA1 and ES-0_9-BETA1.

Regards,
Loren

*** prim-ctl.c.orig	Fri Apr 11 15:54:34 1997
--- prim-ctl.c	Wed Dec 13 00:55:27 2000
***************
*** 77,84 ****
  				if (termeq(fromcatcher->term, "retry")) {
  					retry = TRUE;
  					unblocksignals();
! 				} else
  					throw(fromcatcher);
  			EndExceptionHandler
  
  		EndExceptionHandler
--- 77,86 ----
  				if (termeq(fromcatcher->term, "retry")) {
  					retry = TRUE;
  					unblocksignals();
! 				} else {
! 					unblocksignals();
  					throw(fromcatcher);
+ 				}
  			EndExceptionHandler
  
  		EndExceptionHandler
*** signal.c.orig	Fri Apr 11 15:54:37 1997
--- signal.c	Wed Dec 13 00:59:29 2000
***************
*** 68,74 ****
  
  /* catcher -- catch (and defer) a signal from the kernel */
  static void catcher(int sig) {
! #if !SYSV_SIGNALS /* only do this for unreliable signals */
  	signal(sig, catcher);
  #endif
  	if (hasforked)
--- 68,74 ----
  
  /* catcher -- catch (and defer) a signal from the kernel */
  static void catcher(int sig) {
! #if SYSV_SIGNALS /* only do this for unreliable signals */
  	signal(sig, catcher);
  #endif
  	if (hasforked)