Mailing-List: contact cygwin-developers-help AT sourceware DOT cygnus DOT com; run by ezmlm List-Subscribe: List-Archive: List-Post: List-Help: , Sender: cygwin-developers-owner AT sources DOT redhat DOT com Delivered-To: mailing list cygwin-developers AT sources DOT redhat DOT com Date: Thu, 6 Sep 2001 12:49:35 -0400 From: Christopher Faylor To: cygwin-developers AT cygwin DOT com Subject: Re: I know CVS is hosed Message-ID: <20010906124935.A31212@redhat.com> Reply-To: cygwin-developers AT cygwin DOT com Mail-Followup-To: cygwin-developers AT cygwin DOT com References: <20010906113843 DOT A30174 AT redhat DOT com> <1614189623 DOT 20010906194903 AT logos-m DOT ru> <20010906115913 DOT A30555 AT redhat DOT com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20010906115913.A30555@redhat.com> User-Agent: Mutt/1.3.21i On Thu, Sep 06, 2001 at 11:59:13AM -0400, Christopher Faylor wrote: >On Thu, Sep 06, 2001 at 07:49:03PM +0400, egor duda wrote: >>Hi! >> >>Thursday, 06 September, 2001 Christopher Faylor cgf AT redhat DOT com wrote: >> >>CF> While attempting to cut down on the size of the 1.3.3 DLL, I uncovered a few >>CF> problems with cygheap. I couldn't track them down before I went to bed. >> >>CF> The symptom is that applications die in cfree. The problem manifests quickly >>CF> in a process which execs a process which execs a process. >> >>CF> I haven't seen the problem that Egor reported with free but it probably is not >>CF> related to the cygheap problem. It probably is somehow related to the new >>CF> code in sigproc which allocates the zombie array dynamically. >> >>yes, it crashes at 'free (zombies)'. i've built malloc-debuggibg version >>but somehow it doesn't get to free(zombies), but crash earlier at >>_cmalloc () when one of buckets contains invalid nonzero pointer. > >Yep. The _cmalloc stuff is what I'm going to track down. Just to summarize what is going wrong in cygwin: Cygwin has recently adopted something called the "cygwin heap". This is an internal heap that is inherited by forked/execed children. It consists of process specific information that should be inherited. So things like the file descriptor table, the current working directory, and the chroot value live there. It is also used to pass argv information to a child process. That's the problem. If you allocate space for argv on the heap and then exec a process the child process (1) will happily use the space in the heap. But what happens when that process execs another process (2)? The space used by child process (1) still is being used in child process (2) but it is basically just a memory leak. What is supposed to happen is that memory used by child process 1 is tagged in such a way that child process 2 will know to delete it. This is in cygheap_fixup_in_child. AFAICT, this really wasn't actually working for a while. One problem is that the mechanism which kept track of the freed space (buckets) wasn't being propagated to exec'ed children. This resulted in a potentially huge memory leak. So, I moved the bucket array from cygwin's data area into the cygwin heap where it will be propagated automatically. That's where everything broke down. For some reason it looks like things were being doubly freed in the cygheap_fixup_in_child cleanups, resulting in the SEGVs that we're seeing. I think I have a "fix" for this (amazing what comes to you while you are sleeping) but I don't really understand why the previous code wasn't working. So, I'll be doing some studying in between meetings today to try to convince myself that I understand what was going wrong before. I don't like fixing things by merely changing an algorithm unless I understand what was wrong with the old algorithm. In the meantime, if you could check out the enclosed patch and verify or disappoint, I'd be grateful. For the curious, most of the patch is actually just replacing cygheap->buckets with a macro so that I could switch back and forth between the static array and the cygheap->buckets. cgf Index: cygheap.cc =================================================================== RCS file: /cvs/uberbaum/winsup/cygwin/cygheap.cc,v retrieving revision 1.37 diff -p -r1.37 cygheap.cc *** cygheap.cc 2001/09/06 03:39:18 1.37 --- cygheap.cc 2001/09/06 16:47:38 *************** struct cygheap_entry *** 36,41 **** --- 36,43 ---- }; #define NBUCKETS (sizeof (cygheap->buckets) / sizeof (cygheap->buckets[0])) + #define cygbuckets cygheap->buckets + #define N0 ((_cmalloc_entry *) NULL) #define to_cmalloc(s) ((_cmalloc_entry *) (((char *) (s)) - (int) (N0->data))) *************** cygheap_fixup_in_child (child_info *ci, *** 136,142 **** for (_cmalloc_entry *rvc = cygheap->chain; rvc; rvc = rvc->prev) { cygheap_entry *ce = (cygheap_entry *) rvc->data; ! if (rvc->b >= NBUCKETS || ce->type <= HEAP_1_START) continue; else if (ce->type < HEAP_1_MAX) ce->type += HEAP_1_MAX; /* Mark for freeing after next exec */ --- 138,144 ---- for (_cmalloc_entry *rvc = cygheap->chain; rvc; rvc = rvc->prev) { cygheap_entry *ce = (cygheap_entry *) rvc->data; ! if (ce->type < HEAP_1_START) continue; else if (ce->type < HEAP_1_MAX) ce->type += HEAP_1_MAX; /* Mark for freeing after next exec */ *************** _cmalloc (int size) *** 200,209 **** continue; cygheap_protect->acquire (); ! if (cygheap->buckets[b]) { ! rvc = (_cmalloc_entry *) cygheap->buckets[b]; ! cygheap->buckets[b] = rvc->ptr; rvc->b = b; } else --- 202,211 ---- continue; cygheap_protect->acquire (); ! if (cygbuckets[b]) { ! rvc = (_cmalloc_entry *) cygbuckets[b]; ! cygbuckets[b] = rvc->ptr; rvc->b = b; } else *************** static void __stdcall *** 223,232 **** _cfree (void *ptr) { cygheap_protect->acquire (); _cmalloc_entry *rvc = to_cmalloc (ptr); DWORD b = rvc->b; ! rvc->ptr = cygheap->buckets[b]; ! cygheap->buckets[b] = (char *) rvc; cygheap_protect->release (); } --- 225,235 ---- _cfree (void *ptr) { cygheap_protect->acquire (); + ((cygheap_entry *) ptr)->type = HEAP_FREE; _cmalloc_entry *rvc = to_cmalloc (ptr); DWORD b = rvc->b; ! rvc->ptr = cygbuckets[b]; ! cygbuckets[b] = (char *) rvc; cygheap_protect->release (); }