www.delorie.com/archives/browse.cgi   search  
Mail Archives: cygwin/1999/09/24/16:25:04

Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm
List-Unsubscribe: <mailto:cygwin-unsubscribe-archive-cygwin=delorie DOT com AT sourceware DOT cygnus DOT com>
List-Subscribe: <mailto:cygwin-subscribe AT sourceware DOT cygnus DOT com>
List-Archive: <http://sourceware.cygnus.com/ml/cygwin/>
List-Post: <mailto:cygwin AT sourceware DOT cygnus DOT com>
List-Help: <mailto:cygwin-help AT sourceware DOT cygnus DOT com>, <http://sourceware.cygnus.com/ml/#faqs>
Sender: cygwin-owner AT sourceware DOT cygnus DOT com
Delivered-To: mailing list cygwin AT sourceware DOT cygnus DOT com
From: patl AT cag DOT lcs DOT mit DOT edu (Patrick J. LoPresti)
To: cygwin AT sourceware DOT cygnus DOT com
Subject: Race condition in 19990922 snapshot [PATCH]
Date: 24 Sep 1999 16:22:17 -0400
Message-ID: <s5gyadw9i12.fsf@egghead.curl.com>
Lines: 102

The appended patch fixes a race condition in select.cc.

This code creates a Win32 socket ("exitsock") which it uses to let the
master thread notify a slave thread that it should exit a particular
call to Winsock select().  (I know, technically, that threads don't
really have a master/slave relationship; but look at the code and you
will see what I mean.)

The problem here is twofold:

  1) Doing select() on something created by socket(), when that socket
     has not been connected our bound to anything, does not have
     defined behavior in any documentation I can find.  The code in
     select.cc seems to rely on some weird Winsock behavior; namely,
     that calling closesocket() on a newly-created socket will trigger
     an "exceptional condition" for any select() waiting on it.

  2) There is a race condition anyway.  It is possible for the master
     thread to call closesocket() before the slave thread even gets to
     the select().  The result is to select() on a value which is no
     longer a legitimate socket at all.  I have seen this race happen
     under strace.  The result is (usually) for the select() to return
     with an error code, which Cygwin pretty much ignores and
     continues plowing through.  Sometimes the result is a complete
     hang on our SMP box.  (I do not have details of the failures, but
     I suspect some kind of corruption in the internals of Winsock or
     Cygwin.)

The enclosed patch uses a socketpair() instead of a single socket.
The slave thread now selects for reading on one member of the pair;
the master can close the other end to notify the slave it should fall
out of select().  This fixes both of the problems: The behavior is now
well-defined, and the race condition is avoided.

After installing this patch, most of the Cygwin hangs we were
experiencing are now gone.  At least, I can't reproduce them in 20
seconds like I could before :-).

 - Pat

P.S.  You will get a compile-time warning because socketpair() is not
declared.  I was unable to find it in any .h file or I would have
#included it...


======================================================================

--- cygwin-src/winsup/select.cc	Wed Sep 22 23:57:31 1999
+++ cygwin-src-patl/winsup/select.cc	Fri Sep 24 15:59:42 1999
@@ -991,7 +991,7 @@ struct socketinf
   {
     HANDLE thread;
     winsock_fd_set readfds, writefds, exceptfds;
-    SOCKET exitsock;
+    int exitsockpair[2];
     select_record *start;
   };
 
@@ -1090,8 +1090,9 @@ thread_socket (void *arg)
 	    }
 	}
 
-  if (WINSOCK_FD_ISSET (si->exitsock, &si->exceptfds))
-    select_printf ("exitsock got an exception");
+  if (WINSOCK_FD_ISSET (dtable[si->exitsockpair[1]]->get_handle (),
+                        &si->readfds))
+    select_printf ("exitsockpair was closed");
   return 0;
 }
 
@@ -1133,15 +1134,16 @@ start_thread_socket (select_record *me, 
 	  }
       }
 
-  if ((si->exitsock = socket (PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET)
+  if (socketpair (PF_INET, SOCK_STREAM, 0, si->exitsockpair) != 0)
     {
-      __seterrno ();
       select_printf ("cannot create socket");
       return -1;
     }
 
-  select_printf ("exitsock %p", si->exitsock);
-  WINSOCK_FD_SET ((HANDLE)si->exitsock, &si->exceptfds);
+  select_printf ("exitsocks %d %d", si->exitsockpair[0],
+                 si->exitsockpair[1]);
+  WINSOCK_FD_SET (dtable[si->exitsockpair[1]]->get_handle (),
+                  &si->readfds);
   stuff->device_specific[FHDEVN(FH_SOCKET)] = (void *) si;
   si->start = &stuff->start;
   select_printf ("stuff_start %p", &stuff->start);
@@ -1157,8 +1159,9 @@ socket_cleanup (select_record *me, selec
   select_printf ("si %p si->thread %p", si, si ? si->thread : NULL);
   if (si && si->thread)
     {
-      closesocket (si->exitsock);
+      _close (si->exitsockpair[0]);
       WaitForSingleObject (si->thread, INFINITE);
+      _close (si->exitsockpair[1]);
       CloseHandle (si->thread);
       stuff->device_specific[FHDEVN(FH_SOCKET)] = NULL;
       delete si;

--
Want to unsubscribe from this list?
Send a message to cygwin-unsubscribe AT sourceware DOT cygnus DOT com

- Raw text -


  webmaster     delorie software   privacy  
  Copyright © 2019   by DJ Delorie     Updated Jul 2019