Mailing-List: contact cygwin-developers-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin-developers AT sources DOT redhat DOT com Message-ID: <8F23E55D511AD5119A6800D0B76FDDE101A26A49@cpex3.channelpoint.com> From: Troy Noble To: "'cygwin-developers AT cygwin DOT com'" Subject: RE: I know CVS is hosed Date: Thu, 6 Sep 2001 11:14:15 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2653.19) Content-Type: text/plain; charset="iso-8859-1" X-Scanned-By: MIMEDefang 1.0 (http://www.roaringpenguin.com/mimedefang/) Ding ding ding ding... we have a winner. I don't know if this will help at all, hopefully won't come across as a "me-too" post. Remember I reported a while back that I had a problem with cygheap.cc with the buckets code when forking lots of processes, that it would give me a Dr Watson periodically... but was sporadic and hard to reproduce? Specifically the line of code was (at that time) in cygheap.cc:102 which is now line 206 of the current CVS cygheap.cc #0 _cmalloc (size=6) at /work/build/src/winsup/cygwin/cygheap.cc:102 99 if (buckets[b]) 100 { 101 rvc = (_cmalloc_entry *) buckets[b]; 102 buckets[b] = rvc->ptr; 103 rvc->b = b; 104 } (gdb) p *rvc Cannot access memory at address 0x6 That looks like an awfully familiar section of code (it's in your patch)! It was beyond my meager knowledge of cygwin internals at this stage to debug it, but I do have some more complete stack traces lying around somewhere if you're curious about how the failure mode manifested. But they are old and probably of little use, since it sounds like you have already come up with a way to repeatably reproduce it. I'm so glad you found this. I wish I had an hour or two right now so I could really sit and study the code to see what it's doing, then I could maybe talk intelligently about whether your patch fixes it for good. I'd certainly be willing to test it though to see if I can make it die. I'll compile it up with latest CVS and run it for a day or two. Maybe tonight I can look at the code more closely too. And confirmed on the part about it having been there for a while. I first started seeing this problem when I upgrade from 1.1.4 to 1.1.8-1. Boy am I glad now I started reading the developer list! Troy -----Original Message----- From: Christopher Faylor [mailto:cgf AT redhat DOT com] Sent: Thursday, September 06, 2001 10:50 AM To: cygwin-developers AT cygwin DOT com Subject: Re: I know CVS is hosed On Thu, Sep 06, 2001 at 11:59:13AM -0400, Christopher Faylor wrote: >On Thu, Sep 06, 2001 at 07:49:03PM +0400, egor duda wrote: >>Hi! >> >>Thursday, 06 September, 2001 Christopher Faylor cgf AT redhat DOT com wrote: >> >>CF> While attempting to cut down on the size of the 1.3.3 DLL, I uncovered a few >>CF> problems with cygheap. I couldn't track them down before I went to bed. >> >>CF> The symptom is that applications die in cfree. The problem manifests quickly >>CF> in a process which execs a process which execs a process. >> >>CF> I haven't seen the problem that Egor reported with free but it probably is not >>CF> related to the cygheap problem. It probably is somehow related to the new >>CF> code in sigproc which allocates the zombie array dynamically. >> >>yes, it crashes at 'free (zombies)'. i've built malloc-debuggibg version >>but somehow it doesn't get to free(zombies), but crash earlier at >>_cmalloc () when one of buckets contains invalid nonzero pointer. > >Yep. The _cmalloc stuff is what I'm going to track down. Just to summarize what is going wrong in cygwin: Cygwin has recently adopted something called the "cygwin heap". This is an internal heap that is inherited by forked/execed children. It consists of process specific information that should be inherited. So things like the file descriptor table, the current working directory, and the chroot value live there. It is also used to pass argv information to a child process. That's the problem. If you allocate space for argv on the heap and then exec a process the child process (1) will happily use the space in the heap. But what happens when that process execs another process (2)? The space used by child process (1) still is being used in child process (2) but it is basically just a memory leak. What is supposed to happen is that memory used by child process 1 is tagged in such a way that child process 2 will know to delete it. This is in cygheap_fixup_in_child. AFAICT, this really wasn't actually working for a while. One problem is that the mechanism which kept track of the freed space (buckets) wasn't being propagated to exec'ed children. This resulted in a potentially huge memory leak. So, I moved the bucket array from cygwin's data area into the cygwin heap where it will be propagated automatically. That's where everything broke down. For some reason it looks like things were being doubly freed in the cygheap_fixup_in_child cleanups, resulting in the SEGVs that we're seeing. I think I have a "fix" for this (amazing what comes to you while you are sleeping) but I don't really understand why the previous code wasn't working. So, I'll be doing some studying in between meetings today to try to convince myself that I understand what was going wrong before. I don't like fixing things by merely changing an algorithm unless I understand what was wrong with the old algorithm. In the meantime, if you could check out the enclosed patch and verify or disappoint, I'd be grateful. For the curious, most of the patch is actually just replacing cygheap->buckets with a macro so that I could switch back and forth between the static array and the cygheap->buckets. cgf Index: cygheap.cc =================================================================== RCS file: /cvs/uberbaum/winsup/cygwin/cygheap.cc,v retrieving revision 1.37 diff -p -r1.37 cygheap.cc *** cygheap.cc 2001/09/06 03:39:18 1.37 --- cygheap.cc 2001/09/06 16:47:38 *************** struct cygheap_entry *** 36,41 **** --- 36,43 ---- }; #define NBUCKETS (sizeof (cygheap->buckets) / sizeof (cygheap->buckets[0])) + #define cygbuckets cygheap->buckets + #define N0 ((_cmalloc_entry *) NULL) #define to_cmalloc(s) ((_cmalloc_entry *) (((char *) (s)) - (int) (N0->data))) *************** cygheap_fixup_in_child (child_info *ci, *** 136,142 **** for (_cmalloc_entry *rvc = cygheap->chain; rvc; rvc = rvc->prev) { cygheap_entry *ce = (cygheap_entry *) rvc->data; ! if (rvc->b >= NBUCKETS || ce->type <= HEAP_1_START) continue; else if (ce->type < HEAP_1_MAX) ce->type += HEAP_1_MAX; /* Mark for freeing after next exec */ --- 138,144 ---- for (_cmalloc_entry *rvc = cygheap->chain; rvc; rvc = rvc->prev) { cygheap_entry *ce = (cygheap_entry *) rvc->data; ! if (ce->type < HEAP_1_START) continue; else if (ce->type < HEAP_1_MAX) ce->type += HEAP_1_MAX; /* Mark for freeing after next exec */ *************** _cmalloc (int size) *** 200,209 **** continue; cygheap_protect->acquire (); ! if (cygheap->buckets[b]) { ! rvc = (_cmalloc_entry *) cygheap->buckets[b]; ! cygheap->buckets[b] = rvc->ptr; rvc->b = b; } else --- 202,211 ---- continue; cygheap_protect->acquire (); ! if (cygbuckets[b]) { ! rvc = (_cmalloc_entry *) cygbuckets[b]; ! cygbuckets[b] = rvc->ptr; rvc->b = b; } else *************** static void __stdcall *** 223,232 **** _cfree (void *ptr) { cygheap_protect->acquire (); _cmalloc_entry *rvc = to_cmalloc (ptr); DWORD b = rvc->b; ! rvc->ptr = cygheap->buckets[b]; ! cygheap->buckets[b] = (char *) rvc; cygheap_protect->release (); } --- 225,235 ---- _cfree (void *ptr) { cygheap_protect->acquire (); + ((cygheap_entry *) ptr)->type = HEAP_FREE; _cmalloc_entry *rvc = to_cmalloc (ptr); DWORD b = rvc->b; ! rvc->ptr = cygbuckets[b]; ! cygbuckets[b] = (char *) rvc; cygheap_protect->release (); }