Mailing-List: contact cygwin-developers-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin-developers AT sources DOT redhat DOT com Date: Fri, 13 Jul 2001 20:13:39 -0400 From: Christopher Faylor To: Cygwin-Developers Subject: Re: 2001-06-28 CVS ash Background Win32 Process Hang Problem Message-ID: <20010713201339.A11377@redhat.com> Reply-To: cygwin-developers AT cygwin DOT com Mail-Followup-To: Cygwin-Developers References: <20010710162811 DOT D320 AT dothill DOT com> <20010713150905 DOT A282 AT dothill DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.3.11i In-Reply-To: <20010713150905.A282@dothill.com>; from Jason.Tishler@dothill.com on Fri, Jul 13, 2001 at 03:09:05PM -0400 On Fri, Jul 13, 2001 at 03:09:05PM -0400, Jason Tishler wrote: >I have some more (hopefully useful) information... > >On Tue, Jul 10, 2001 at 04:28:11PM -0400, Jason Tishler wrote: >> The problem does not occur in 1.3.2 -- so it has been introduced since >> that release. > >It appears that this problem was introduced somewhere between 1.3.2 >(which uname -a indicates was built on 2001-05-20 23:28) and when the >2001-05-22 snapshot was built. Unfortunately, even after reviewing the >CVS commit mailing list, I could not find the culprit. > >Anyway, I've attached two strace logs -- the first exhibits the hang, >then second does not. The first one is produced by: > > $ strace -o hang.log sh hang.sh > >and the second by: > > $ strace -o nohang.log bash hang.sh > >Here is an interesting snippet from hang.log: > > 1262 248348 [main] sh 428 spawn_guts: spawn_guts null_app_name 0 (c:\WINNT\system32\notepad.exe, c:\WINNT\system32\notepad.exe) > 1701 250049 [main] sh 428 spawn_guts: 463 = spawn_guts (/mnt/c/WINNT/system32/notepad, c:\WINNT\system32\notepad.exe) > 276 250325 [main] sh 428 proc_subproc: args: 1, 37876916 > 189 250514 [main] sh 428 proc_subproc: added pid 463 to wait list, slot 0, winpid 0x1CF, handle 0x344 > 171 250685 [main] sh 428 proc_subproc: returning 1 166 250851 [main] sh 428 spawn_guts: spawned windows pid 463 > 178 251029 [proc] sh 428 wait_subproc: looping > >XXX hangs here XXX > >55124741 55375770 [proc] sh 428 proc_subproc: args: 2, 0 > 318 55376088 [proc] sh 428 proc_subproc: pid 463[0] terminated, handle 0x344, nchildren 1, nzombies 0 > 220 55376308 [proc] sh 428 proc_subproc: zombifying [0], pid 463, handle 0x344, nchildren 1 > 171 55376479 [proc] sh 428 proc_subproc: returning 1 > 161 55376640 [proc] sh 428 sig_send: pid 428, signal 20, its_me 1 > 164 55376804 [proc] sh 428 sig_send: Not waiting for sigcomplete. its_me 1 signal 20 > 162 55376966 [proc] sh 428 sig_send: returning 0 from sending signal 20 > 159 55377125 [proc] sh 428 wait_subproc: looping > 171 55377296 [main] sh 428 spawn_guts: subprocess exited > >Note that when the hang returns there appears to be some unusually large >numbers in column one and two which seem to indicate uninitialized or >corrupted memory. > >Also attached is a gdb session obtained by attaching to the hung sh.exe. >It appears that Cygwin is hung on WaitForMultipleObjects(), but nwait is >also a huge number... It's unlikely that this is a huge number. It's more likely that gdb is confused about the value. The problem is due to a bug in vfork. I thought I'd reported that in the past. cgf