Mailing-List: contact cygwin-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Unsubscribe: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-owner AT sourceware DOT cygnus DOT com Delivered-To: mailing list cygwin AT sourceware DOT cygnus DOT com From: patl AT cag DOT lcs DOT mit DOT edu (Patrick J. LoPresti) To: cygwin AT sourceware DOT cygnus DOT com Subject: Race condition in 19990922 snapshot [PATCH] Date: 24 Sep 1999 16:22:17 -0400 Message-ID: Lines: 102 The appended patch fixes a race condition in select.cc. This code creates a Win32 socket ("exitsock") which it uses to let the master thread notify a slave thread that it should exit a particular call to Winsock select(). (I know, technically, that threads don't really have a master/slave relationship; but look at the code and you will see what I mean.) The problem here is twofold: 1) Doing select() on something created by socket(), when that socket has not been connected our bound to anything, does not have defined behavior in any documentation I can find. The code in select.cc seems to rely on some weird Winsock behavior; namely, that calling closesocket() on a newly-created socket will trigger an "exceptional condition" for any select() waiting on it. 2) There is a race condition anyway. It is possible for the master thread to call closesocket() before the slave thread even gets to the select(). The result is to select() on a value which is no longer a legitimate socket at all. I have seen this race happen under strace. The result is (usually) for the select() to return with an error code, which Cygwin pretty much ignores and continues plowing through. Sometimes the result is a complete hang on our SMP box. (I do not have details of the failures, but I suspect some kind of corruption in the internals of Winsock or Cygwin.) The enclosed patch uses a socketpair() instead of a single socket. The slave thread now selects for reading on one member of the pair; the master can close the other end to notify the slave it should fall out of select(). This fixes both of the problems: The behavior is now well-defined, and the race condition is avoided. After installing this patch, most of the Cygwin hangs we were experiencing are now gone. At least, I can't reproduce them in 20 seconds like I could before :-). - Pat P.S. You will get a compile-time warning because socketpair() is not declared. I was unable to find it in any .h file or I would have #included it... ====================================================================== --- cygwin-src/winsup/select.cc Wed Sep 22 23:57:31 1999 +++ cygwin-src-patl/winsup/select.cc Fri Sep 24 15:59:42 1999 @@ -991,7 +991,7 @@ struct socketinf { HANDLE thread; winsock_fd_set readfds, writefds, exceptfds; - SOCKET exitsock; + int exitsockpair[2]; select_record *start; }; @@ -1090,8 +1090,9 @@ thread_socket (void *arg) } } - if (WINSOCK_FD_ISSET (si->exitsock, &si->exceptfds)) - select_printf ("exitsock got an exception"); + if (WINSOCK_FD_ISSET (dtable[si->exitsockpair[1]]->get_handle (), + &si->readfds)) + select_printf ("exitsockpair was closed"); return 0; } @@ -1133,15 +1134,16 @@ start_thread_socket (select_record *me, } } - if ((si->exitsock = socket (PF_INET, SOCK_STREAM, 0)) == INVALID_SOCKET) + if (socketpair (PF_INET, SOCK_STREAM, 0, si->exitsockpair) != 0) { - __seterrno (); select_printf ("cannot create socket"); return -1; } - select_printf ("exitsock %p", si->exitsock); - WINSOCK_FD_SET ((HANDLE)si->exitsock, &si->exceptfds); + select_printf ("exitsocks %d %d", si->exitsockpair[0], + si->exitsockpair[1]); + WINSOCK_FD_SET (dtable[si->exitsockpair[1]]->get_handle (), + &si->readfds); stuff->device_specific[FHDEVN(FH_SOCKET)] = (void *) si; si->start = &stuff->start; select_printf ("stuff_start %p", &stuff->start); @@ -1157,8 +1159,9 @@ socket_cleanup (select_record *me, selec select_printf ("si %p si->thread %p", si, si ? si->thread : NULL); if (si && si->thread) { - closesocket (si->exitsock); + _close (si->exitsockpair[0]); WaitForSingleObject (si->thread, INFINITE); + _close (si->exitsockpair[1]); CloseHandle (si->thread); stuff->device_specific[FHDEVN(FH_SOCKET)] = NULL; delete si; -- Want to unsubscribe from this list? Send a message to cygwin-unsubscribe AT sourceware DOT cygnus DOT com