Mailing-List: contact cygwin-developers-help AT cygwin DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner AT cygwin DOT com Delivered-To: mailing list cygwin-developers AT cygwin DOT com Date: Fri, 6 Sep 2002 20:17:13 +0400 From: egor duda Reply-To: egor duda Organization: deo X-Priority: 3 (Normal) Message-ID: <1176882135.20020906201713@logos-m.ru> To: Christopher Faylor Subject: Re: hang in sig_wait waiting for debug lock In-Reply-To: <20020906151713.GC21699@redhat.com> References: <7710998905 DOT 20020828173811 AT logos-m DOT ru> <20020905153320 DOT GC16827 AT redhat DOT com> <591948241 DOT 20020906185459 AT logos-m DOT ru> <20020906151713 DOT GC21699 AT redhat DOT com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit Hi! Friday, 06 September, 2002 Christopher Faylor cgf AT redhat DOT com wrote: >>Changelog states, however, that setclexec stuff isn't needed. Yet i >>can't see why we shouldn't process protected handle list as long as we >>recreating handles during set-close-on-exec operation. Can you give a >>comment? CF> I assume that you mean this entry: CF> 2002-07-14 Christopher Faylor CF> * dcrt0.cc (dll_crt0_1): Move debug_init call back to here. Avoid a CF> compiler warning. CF> * shared.cc (memory_init): Remove debug_init call. CF> * debug.h (handle_list): Change "clexec" to "inherited". CF> * debug.cc: Remove a spurious declaration. CF> (setclexec): Conditionalize away since it is currently unused. CF> (add_handle): Use inherited field rather than clexec. CF> (debug_fixup_after_fork_exec): Ditto. Move debugging output to CF> delete_handle. CF> (delete_handle): Add debugging output. CF> * fhandler.cc (fhandler_base::set_inheritance): Don't bother setting CF> inheritance in debugging table since the handle was never protected CF> anyway. CF> (fhandler_base::fork_fixup): Ditto. CF> I'm at a loss to understand why adding additional things into the CF> protected handle table would solve a race. I thought about it again and here's a hypothesis of what may be happening. I suspect that it's not exactly a race. I.e., it's caused not by randomness in order in which different threads of control are executed, but by randomness in which handles are allocated by OS. If value of some handle allocated in one process is equal to value of handle we were dealing with in other, we may got warnings from add_handle. system_printf is pumping data to STD_ERROR_HANDLE. It's possibly a pipe to tty master. Handling data in tty master thread is quite complicated, and may possibly get to the same add_handle() but with muto already locked. Normally it's not a big problem since system_printf() will return asynchronously to tty master and unlock the mutex. But here we have the second nasty random thing that may happen: The pipe may be filled up. In this case WriteFile in system_printf blocked until master drain the data from pipe. And master may be blocked because it wants to protect a handle but debug muto is locked. I've noticed special here.unlock() before debug_printf() in add_handle(). Could it be that it was added there for similar reasons? If not, then it's not clear why we should unlock mute explicitly when it will be unlocked in the next line when 'return' statement is executed? CF> There are too many places where the fd handle is manipulated but CF> not protected for this code to be turned on. And since there is CF> no easy way to get distinct handle name information into the CF> table, it wouldn't make sense to add the protection anyway. Egor. mailto:deo AT logos-m DOT ru ICQ 5165414 FidoNet 2:5020/496.19